Closed aryakaul closed 1 year ago
Hi Arya,
Thanks for using Cuttlefish! Just making sure how you're using it: currently we do not output color-information explicitly. Are you inferring the color-set of the maximal unitigs through an invertex-indexing like approach from the GFA-tilings?
Currently in the path-tiling output of the GFA / GFA-reduced format, each tiling has a corresponding header including the following information:
Reference:x_Sequence:y
where x
is the sequence-ID of the reference in input (i.e. it is the x
'th FASTA file), and y
is the sequence-ID of the corresponding record in that file—as discussed here. So the "Reference:x
"-information should provide you the corresponding color of the x
'th file.
Let me know if I got your query right!
Thanks for the clarification Jamshed! I think I figured out the problem. I erroneously expected these two commands to be identical:
(cuttlefish) ➜ cuttlefish build -s ../data/split0_part1.fasta -s ../data/split0_part2.fasta -t 14 -k 31 -m 28 -f 1 -w /n/scratch3/users/a/ak586/tmp_cuttlefish -o ./test_1
(cuttlefish) ➜ cat fof.txt
../data/split0_part1.txt
../data/split0_part2.txt
(cuttlefish) ➜ cuttlefish build -l ./fof.txt -t 14 -k 31 -m 28 -f 1 -w /n/scratch3/users/a/ak586/tmp_cuttlefish2 -o ./test_2
But inspecting both of their outputs this is not the case! Only the last file ../data/split0_part2.fasta
is read from the first command
(cuttlefish) ➜ grep '^P' ./test_1.gfa1 | cut -f2 | cut -f1 -d'_' | sort | uniq -c
34839 Reference:1
(cuttlefish) ➜ grep '^P' ./test_2.gfa1 | cut -f2 | cut -f1 -d'_' | sort | uniq -c
35252 Reference:1
34839 Reference:2
I probably just misunderstood the documentation, but in case this is not intended wanted to bring it to your attention, thanks again!
Hi Arya,
You're right—the parsing of the arguments does not seem to match what we have in the documentation here. Seems like a problem with interfacing with the cxxopt
library—they do mention that multiple arguments can be passed as -s ... -s ...
, here. Maybe we're missing arguments because of wrapping the vector
with an optional
, instead of directly using the vector
as in their example.
Thanks for bringing this to our attention!
Hello!
I'm interested in using
cuttlefish 1.0
because of your approach to coloring that yields monochromatic unitigs. I was wondering if you had any suggestions for ways I could color by input files instead of by individual records in fasta files.I had considered just merging my Fasta records into one record; however, then that would include artificial k-mers formed by appending contigs together.
I'm aware this is not the recommended usage of
cuttlefish
but for our research question this approach is necessary. Do you have any suggestions for ways to easily do this?Thank you!
Best, Arya