Closed katkopera closed 1 year ago
hi, thanks for the comment! can i see your command used to run the assembly module real quick? thanks
I am not sure if you have fully read the description of the problem I am reporting.
Of course I can define the samples.csv file myself and I know how to do it but my point is that the QC module documentation suggests that the output I get after quality control I can directly use (without editing) in subsequent modules. And I assume that this is what your intention was.
Did you have both 'tadpole' and 'bayeshmmer' methods from the beginning? It looks like, the second method was only added at some stage and not the whole Snakefile was updated after the change.
If users are to use the samples.csv file directly (output from QC) then the Snakefile needs to be changed.
oh i see! that makes a lot of sense. i did read the full description, but was hoping that the cause of the problem was somewhat simpler and that would be the best scenario because it would be easy to fix.
was also curious when the last time you pulled the repository was? while i was not the last person to have edited the code, i did see that in order for this pipeline to finish running, the samples.csv
file as you mentioned is required to be generated, in case the user needs it for the next step, as seen in line 67 of the Snakefile.
if you wish we could potentially do a really quick zoom call before 4pm today and see we could troubleshoot, maybe that can help you with the issue and help me understand where the problem came from as well. thanks!
also as a sidenote, tadpole was added later than bayeshammer - if there's some problem running bayeshammer feel free to let me know as well! It is going to be a lot slower than tadpole, that's why we added tadpole, but hopefully a lot more accurate (which also potentially run the risk of removing more reads and lead to a really small or empty resulting sequence file).
or if you could potentially provide the command you run that could replicate the error you ran into so we can look into it that'll be fabulous as well. thanks!
The command I run was python /net/ascratch/people/plgkkopera/camp_tests/short-read-assembly/camp_short-read-assembly/workflow/short-read-assembly.py -c 20 -d /net/ascratch/people/plgkkopera/camp_tests/short-read-assembly -s /net/ascratch/people/plgkkopera/camp_tests/short-read-quality-control/short_read_qc/final_reports/samples.csv
I was able to remove error by manually adding 'tadpole' subdirectory to the fwd and rev sample paths.
This will be fully fixed in the next version of the module. Thanks!
Hi, one of the outputs of the module is
/path/to/work/dir/short_read_qc/final_reports/samples.csv
which is supposed to be ingested by the next module e.g. camp_short-read-assembly.Paths to forward and reverse reads in this output file (while using tadpole) are incomplete which leads to the failure of the next modules, for example in camp_short-read-assembly:
In rule make_config of Snakefile you specify final fastqs directory as:
but in rule filter_seq_errors_tp the paths are encoded differently:
so the 'tadpole' subdirectory is missing in samples.csv file
Similar for the bayeshammer option Please fix that :)