NCGG-MGC / IMSindel

IMSindel: An accurate intermediate-size indel detection tool incorporating de novo assembly and gapped global-local alignment with split read analysis
https://www.nature.com/articles/s41598-018-23978-z
MIT License
15 stars 0 forks source link

Missing /data and /test folders after pulling IMSindel from Bioconda #12

Closed davidkundrat closed 4 years ago

davidkundrat commented 4 years ago

Hello to everyone! I have found out about IMSindel yesterday and I wanted to give it a try, so I have pulled it from Bioconda repository to try it out. Everything went fine up to a point ">6. detection of indels...done" and then my process logging file ends and no files were outputed. When I logged into terminal I have noticed a warning on screen which said, that "mydna.mat" was missing. So I went into Bioconda files and I have noticed that /data and /test folders are missing altogether, which seem suspicious. So I will manualy download those folders and move them to Bioconda and I will run it again, but it might be an issue for other users who pull it from Bioconda, so I just wanted to let you know. In any case, I wanted to ask if you could please explain what is "mydna.mat" matrix used for in analysis? Is it a fixed file for all analyses or does it change somehow for each analysis/sample. Also, what is the purpose of "test_sam_row.rb" script? Thank you very much in advance.

Best regards, David Kundrat

guillaumecharbonnier commented 4 years ago

Hi David, I am not an IMSindel developer but I made the Bioconda package to test it. As I was able to get consistent files named 1.out from running the Bioconda version, I did not check the log. Now I see the warning too. I did not include test folder to save Bioconda Circleci resources and I previously assumed data folder was only required for tests. I am waiting for the dev explanation for mydna.mat too and can add it to the Bioconda package if needed.

holrock commented 4 years ago

Hi, mydna.mat is a input file for FASTA(glsearch36). IMSindel needs the file for call glsearch36.

@guillaumecharbonnier Thanks for replay. can you include data directory to bioconda package?

davidkundrat commented 4 years ago

Hello to everyone again! Just an update to my progress: I have manualy added /data folder to Bioconda folder and I have also included --glsearch-mat option in my call just to be sure, that IMSindel can find it properly. Analysis went through and I got a table with indels, so it seems everything is in order. Nevertheless, command took a really long time; about 2 hours all in all. The part that took the longest was ">4. making consensus seqs from support reads..." with other parts of program just zipping by. My input was bam file with size of 310 MB (standard Illumina sequencing on MySeq, mapped with BWA using default settings), I have selected only chromosome 13 for analysis and command was used with 12 threads on our Debian server. For comparison, I have done the same analysis using Pindel, which had good results in FLT3 indel detection on chr13 (https://www.sciencedirect.com/science/article/pii/S1525157812002590) and the whole process took about 5 minutes if not less. I am by no means an expert in this field, so I wouldn't know what might cause such a big difference, just letting you know of my experience so far.

Best regards, David Kundrat

holrock commented 4 years ago

Hi, David, I recommend --thread CPU_NUM --temp /dev/shm options for speedup. But IMSIndel implementation is slow...

davidkundrat commented 4 years ago

Thanks for good tips @holrock I will try to run some more IMSindel commands with options you have suggested. Also, since I am only interested in indels on chr13, I will prepare a reference index just for that chromosome; that might speed up the process some more. In any case, it seems that /data folder was successfully added to Bioconda repository (thanks @guillaumecharbonnier), so I suppose we can close this issue :) Thanks to all for your support!

Best regards David