Closed StickHu closed 1 year ago
Hi @StickHu,
Thanks for trying out SameStr!
I think the problem might be with the database build. You used the virus db mpa_vJan21_CHOCOPhlAnSGB_202103_VSG.fna
to build the samestr db
but the metaphlan profiles generated when using the mpa_vJan21_CHOCOPhlAnSGB_202103
index also include marker hits from Bacteria, Archaea, and Eukarya found in mpa_vJan21_CHOCOPhlAnSGB_202103_SGB.fna
.
The latter is omitted with your command. Hence, there is a mismatch between what is detected with MetaPhlAn (you didn't specify search for viruses) and what is queried with SameStr (you submitted only the virus db).
You could solve this in three ways:
mpa_vJan21_CHOCOPhlAnSGB_202103_SGB.fna
file should have been generated. Use that as input.mpa_vJan21_CHOCOPhlAnSGB_202103_SGB.fna
file and only care about viruses, change the input to mpa_vJan21_CHOCOPhlAnSGB_202103_SGB.fna
mpa_vJan21_CHOCOPhlAnSGB_202103_SGB.fna
file but want to include both databases, concatenate both .fna
files with cat
and use that as input.Let me know if this solved it for you.
I will update the documentation to include instructions on MetaPhlAn4.
Thanks for your help. And it works successfully now.
Amazing work for samestr!But I met a problem when using it. The metaphlan command are following.
metaphlan ${m}_bac.1.fastq.gz,${m}_bac.2.fastq.gz \ --nproc 10 --input_type fastq \ --legacy-output \ -o ${m}_mp.profile.txt \ -t rel_ab \ --bowtie2db $metaphlan_path \ --index mpa_vJan21_CHOCOPhlAnSGB_202103 \ --bowtie2out ${m}_mp.bowtie2out \ --samout ${m}_mp.sam.bz2
And I have done the command recommanded:
within python 3 environment
import pickle import bz2 mpa_pkl_file = 'mpa_vJan21_CHOCOPhlAnSGB_202103.pkl' mpa_pkl = pickle.load(bz2.BZ2File(mpa_pkl_file))
f = bz2.BZ2File(mpa_pkl_file.replace('.pkl', '.py2.pkl'), 'wb') pickle.dump(mpa_pkl, f, protocol = 0)
Then built the samestr_db: samestr db --mpa-pkl /db/mpa_vJan21_CHOCOPhlAnSGB_202103.py2.pkl
--mpa-markers /db/mpa_vJan21_CHOCOPhlAnSGB_202103_VSG.fna --output-dir samestr_db
Then do the samestr convert samestr convert --input-files sam_bz2/*_mp.sam.bz2 --marker-dir samestr_db/ --min-vcov 5 --nprocs 10 --output-dir out_convert/
But the index problem has occurred:
2022-12-03 17:54:34,367 | DEBUG | samestr.convert._bam2freq | bam2freq | 23 | Piling: /data/workdir/huwa/oral_DB/subgingival_plaque/PRJNA552294/samestr_result/out_convert/bam/SRR9684976_mp.bam parse /data/workdir/huwa/oral_DB/subgingival_plaque/PRJNA552294/samestr_result/out_convert/bam/SRR9684976_mp.bam Traceback (most recent call last): File "/data/workdir/huwa/software/miniconda3/envs/samestr/bin/kp2np.py", line 38, in
contig = line[0]
IndexError: list index out of range
Best Stick Hu