Closed fplazaonate closed 10 months ago
This is somewhat of a tricky issue...
The way the CLI is designed, -o only works for genomes because all genomes are grouped together, so they can all be renamed at once. There is no ambiguity.
But because sylph can sketch reads and genomes with the sketch
option, it's not clear how -o should work for reads when genomes are also present. This is why -d
is reserved for reads and -o
for genomes.
In sylph v0.5, I am adding an option called --sample-names so that users can rename read sketch files to a list of sample names. This is probably what one wants for the -o option for reads.
If you have specific ideas on what -o should output for reads, let me know. For now, I will add a warning for when the user only uses -o
for sketching reads.
IMO, sylph sketch should process reads by considering they come from a single sample and generate a single sylsp file, no matter the number of fastq files provided. In this case, multiple fastq files would be multiple sequencing runs of the same library.
My lab and others generate multiple fastq files per sample to reach a target sequencing depth. Currently, sylph interface is not very convenient for that purpose.
The solution is to extract all the files on the fly:
-r <(zcat *.fastq.gz)
At the end, the output file as the name of the file descriptor (e.g: 63.sylsp) that has to renamed later.
Hmm very interesting. Thanks for the input.
I think I will keep this format for now because most software I'm aware of only processes one read pair per sample. What you're saying makes sense, perhaps as an optional mode of input.
I will add an option for renaming in sylph v0.5 though.
Hi @bluenote-1577,
-o option seems to be ignored while sketching samples.
Could you fix this?