Closed Jakob37 closed 2 months ago
No worries, we are happy to get issues and improve the pipeline 😄
At the moment Tomte is designed to run only a few samples with an already existing database, not to create one. However, we have been working on modifying the code so that an actual database can be created. Here is the PR, we still need to test it thoroughly but if you want to give it a try, feel free to run it and to make any suggestions on how to improve it.
No worries, we are happy to get issues and improve the pipeline 😄 At the moment Tomte is designed to run only a few samples with an already existing database, not to create one. However, we have been working on modifying the code so that an actual database can be created. Here is the PR, we still need to test it thoroughly but if you want to give it a try, feel free to run it and to make any suggestions on how to improve it.
OK, great! I'll give it a go. That sounds exactly like what we will need ahead.
Managed to get the DROP run started anyway without any reference db. We will see how that goes ...
Please, let me know if it works, and if it doesn't it will be better if you restart DROP outside from the pipeline as explained here
Please, let me know if it works, and if it doesn't it will be better if you restart DROP outside from the pipeline as explained here
Thanks for the tips! That will be very helpful.
It made it pretty far (edit: not super far, a little bit), into the Counting_Summary
step. Will see if I can figure that out today 🤔
The aberrant expression run went through 🎉 I needed to remove the following cols from the produced sample_annot.tsv
file: GENE_COUNTS_FILE
, SEX
. Otherwise both were produced filled with NA
values, which DROP downstream could not handle. Seems its R parsing 'cleverly' translates string "NA" to nan.
I raised an issue in DROP about it: https://github.com/gagneurlab/drop/issues/568
Still running the splicing run. It filled our RAM when running with 64 threads, but seems to be doing fine on a smaller number of threads (12). Might require some further fiddling with the sample_annot.tsv
file to get it through downstream steps I guess, we will see.
Have you guys btw considered running OUTRIDER and FRASER2 outside the DROP pipeline? Seems the handful of steps could be lifted over from Snakemake to one or two Nextflow subworkflows. This would make things much cleaner with debugging, resuming caching, less dependencies on DROP..
I realize this would mean considerable extra work to set up, and it might not be feasible. Just a thought!
FRASER2 pipeline ran pretty far, crashed in one of the final FRASER calculation steps. Seems to be a bug only appearing when not using external counts: https://github.com/gagneurlab/drop/issues/558
Description of feature
Right now it looks like the
drop_sample_annot.py
expects a reference.In this case, I have a large run with 100 samples, which would serve as their own reference.
It does not seem to be supported currently. If not supplying an external reference, DROP will not run.
I am working around this by supplying an empty reference and making some changes to
drop_sample_annot.py
to not crash it if no data is present in the df. It would be helpful to have an "official" way to do this.What do you think?
Sorry about the issue bombardment today 🫣