danielpodlesny / samestr

SameStr identifies shared strains between pairs of metagenomic samples based on the similarity of SNV profiles.
GNU Affero General Public License v3.0
17 stars 3 forks source link

Convert getting stuck #8

Closed DenaEnnis closed 1 year ago

DenaEnnis commented 1 year ago

Hi, Thanks for this great tool.

I am trying to run convert but encountering a problem. For each sample while running I get a message: 'Running: cat for 0 gene_files' and then the run gets stuck there, no much how many threads and time I give it.

I converted my metaphlan database as instructed and also checked it looks like it should be.

Any suggestions? Thank you

danielpodlesny commented 1 year ago

Hi @DenaEnnis,

Thanks for your interest in SameStr!

Which MetaPhlAn db version are you using? Could you also provide the commands that you used, including for samestr db?

DenaEnnis commented 1 year ago

Hi, I am using metaphlan 3. I changed the metaphlan text output to have a one line header and two columns. I did not change the sam output, but I didn't see that it matters (does it?).

My commands for samestr db:

import pickle
import bz2
mpa_pkl_file = 'mpa_v296_CHOCOPhlAn_201901.pkl'
mpa_pkl = pickle.load(bz2.BZ2File(mpa_pkl_file))
f = bz2.BZ2File(mpa_pkl_file.replace('.pkl', '.py2.pkl'), 'wb')
pickle.dump(mpa_pkl, f, protocol = 0)

My commands for samestr convert:

samestr convert \
--input-files /sci/labs/morani/morani/icore-data/lab/Projects/Dena/Sequencing_results/Exp22_27/mpa_sams/*sam.bz2 \
--marker-dir /sci/labs/morani/morani/icore-data/lab/Projects/Dena/Sequencing_results/Exp22_27/samestr/ \
--output-dir out_convert/ \
--mp-profiles-dir /sci/labs/morani/morani/icore-data/lab/Projects/Dena/Sequencing_results/Exp22_27/samestr/mpa/ \
--nproc 10

Thank you

danielpodlesny commented 1 year ago

Your commands look good, you are just missing one step. Currently, you have implemented the additional compatibility notes for using samestr db with MetaPhlAn ≥3.

You now need to actually run the samestr db command to set up the database:

samestr db \
--mpa-pkl mpa_v30_CHOCOPhlAn_201901.py2.pkl \
--mpa-markers mpa_v30_CHOCOPhlAn_201901.fna \
--output-dir marker_db/

This will create and format the gene_files that you were missing and were getting stuck on initially. After that, you can continue with samestr convert as you have, using --marker-dir marker_db/ as an option.

Note that --output-dir marker_db/ is used as an example - you can name the output directory as you like, but make sure to specify it in the next steps accordingly.

Let me know if it works.

DenaEnnis commented 1 year ago

It worked, thank you! Sorry I missed that part.