Open marina-vn opened 11 months ago
Hello @marina-vn - could you let me know what files you'd like to have and are missing? Also, are you mapping to a single genome or multiple genomes?
Thanks, Matt
Hi Matt, I'm mapping a temporal serie to a single genome and I was hopping to get some figure that shows a comparison of the samples based on SNVs, to see how the population changes month by month.
I'm looking into the "plot" module, but I'm not sure on how to use it with the "compare" results. It says that "Skipping plot 10 - you don't have all required information. You need to run inStrain genome_wide first" but in the manual it reads that if you have provided the gene file in the "profile" step is better, but thought I have done that I'm getting this message. Also, I'm not sure where it's generating the other 9 plots, because thay aren't in the directory that I'm providing not in the one that I'm working in
Hi @marina-vn - OK this makes sense. If you want to dig into this on a SNV-by-SNV level I would consider also adding .bam files to your inStrain compare command in order to run the SNV Pooling functions.
1) I think this is a small bug with the inStrain compare module, where it wants to have a scaffold to bin file to make that plot. I think the best workaround would just be to make an .stb file that just has you genome, and pass that to inStrain compare. That should make it make plot 10. Info on that is here: https://instrain.readthedocs.io/en/master/user_manual.html#preparing-a-scaffold-to-bin-file
2) Plots 1-9 are generated within inStrain profile, not inStrain compare.
Best, Matt
Hi Matt
I have added the bams files an the scaffold to bin, but I'm getting this error:
Traceback (most recent call last):
File "/home/emm4/anaconda3/envs/inStrain2/bin/inStrain", line 31, in <module>
inStrain.controller.Controller().main(args)
File "/home/emm4/anaconda3/envs/inStrain2/lib/python3.8/site-packages/inStrain/controller.py", line 57, in main
self.compare_operation(args)
File "/home/emm4/anaconda3/envs/inStrain2/lib/python3.8/site-packages/inStrain/controller.py", line 89, in compare_operation
inStrain.compare_controller.CompareController(args).main()
File "/home/emm4/anaconda3/envs/inStrain2/lib/python3.8/site-packages/inStrain/compare_controller.py", line 46, in main
self.parse_arguments()
File "/home/emm4/anaconda3/envs/inStrain2/lib/python3.8/site-packages/inStrain/compare_controller.py", line 150, in parse_arguments
self.name2bam = gen_name_2_bam(self.inputs, args.bams, self.kwargs.get('processes', 1))
File "/home/emm4/anaconda3/envs/inStrain2/lib/python3.8/site-packages/inStrain/compare_controller.py", line 679, in gen_name_2_bam
name2bam[name] = inStrain.profile.samtools_ops.prepare_bam_fie(bam, processes)
File "/home/emm4/anaconda3/envs/inStrain2/lib/python3.8/site-packages/inStrain/profile/samtools_ops.py", line 61, in prepare_bam_fie
pysam.AlignmentFile(bam).mapped
File "pysam/libcalignmentfile.pyx", line 748, in pysam.libcalignmentfile.AlignmentFile.__cinit__
File "pysam/libcalignmentfile.pyx", line 958, in pysam.libcalignmentfile.AlignmentFile._open
File "pysam/libchtslib.pyx", line 361, in pysam.libchtslib.HTSFile.check_truncation
OSError: no BGZF EOF marker; file may be truncated
I have installed the program this afternoon using the following command:
conda install -c conda-forge -c bioconda -c defaults instrain
Is there anything else that I should doregarding the installation to correct this error?
Hi @marina-vn - that error means that pysam doesn't like something about the .bam files you're providing. If you provide the same .bam files that you provided to inStrain profile it should work (since it's the same pysam
code in both inStrain profile and inStrain compare)
Best, Matt
Hi @MrOlm ,
I have submitted the bams in the same order than the profiles and the scaffold-to-bin file, and now I'm getting this error
..:: inStrain compare Step 3. Auxiliary processing ::..
Traceback (most recent call last): File "/home/emm4/anaconda3/envs/inStrain2/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3653, in get_loc return self._engine.get_loc(casted_key) File "pandas/_libs/index.pyx", line 147, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 176, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 7080, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 7088, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'cryptic'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/emm4/anaconda3/envs/inStrain2/bin/inStrain", line 31, in <module>
inStrain.controller.Controller().main(args)
File "/home/emm4/anaconda3/envs/inStrain2/lib/python3.8/site-packages/inStrain/controller.py", line 57, in main
self.compare_operation(args)
File "/home/emm4/anaconda3/envs/inStrain2/lib/python3.8/site-packages/inStrain/controller.py", line 89, in compare_operation
inStrain.compare_controller.CompareController(args).main()
File "/home/emm4/anaconda3/envs/inStrain2/lib/python3.8/site-packages/inStrain/compare_controller.py", line 73, in main
self.run_auxillary_processing()
File "/home/emm4/anaconda3/envs/inStrain2/lib/python3.8/site-packages/inStrain/compare_controller.py", line 297, in run_auxillary_processing
PM = inStrain.polymorpher.PoolController(self.SC_objects, self.name2bam)
File "/home/emm4/anaconda3/envs/inStrain2/lib/python3.8/site-packages/inStrain/polymorpher.py", line 65, in __init__
db = db[db['cryptic'] == False]
File "/home/emm4/anaconda3/envs/inStrain2/lib/python3.8/site-packages/pandas/core/frame.py", line 3761, in __getitem__
indexer = self.columns.get_loc(key)
File "/home/emm4/anaconda3/envs/inStrain2/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3655, in get_loc
raise KeyError(key) from err
KeyError: 'cryptic'
huh- could you send me the log file please? and can you confirm that each of your inStrain profile objects has at least 1 SNV called in the SNV output tables?
Best, Matt
Hi,
I have checked the profiles and removed two that didn't have any SNV, run it again and it generated the output but not the figure:
`*** ..:: inStrain compare Step 1. Load data ::..
Loading Profiles into RAM: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 82/82 [00:01<00:00, 42.79it/s] 1 of 1 scaffolds are in at least 2 samples
..:: inStrain compare Step 2. Run comparisons ::..
Running group 1 of 1 Comparing scaffolds: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [03:33<00:00, 213.40s/it]
..:: inStrain compare Step 3. Auxiliary processing ::..
Pulling 74482 SNVs from 82 scaffolds. This should take ~ 1.4 min in total Pulling SNVs from BAMs: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 82/82 [00:03<00:00, 23.05it/s] Generating pooled SNV summary table: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.46it/s] Cannot cluster genome 37F6.fasta; 14 of 3321 comaprisons involve no genomic overlap at all: see log for more
..:: inStrain compare Step 4. Store results ::..
making plots 10 Plotting plot 10 Cannot cluster genome 37F6.fasta; 14 of 3321 comaprisons involve no genomic overlap at all: see log for more Done! $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
..:: inStrain compare finished ::..
Output tables........ /scr/marina_scr/MG20/bamreads/MG_mapped_only_iSp_COMPARE_bams_noempty/output/ Figures.............. /scr/marina_scr/MG20/bamreads/MG_mapped_only_iSp_COMPARE_bams_noempty/figures/ Logging.............. /scr/marina_scr/MG20/bamreads/MG_mapped_only_iSp_COMPARE_bams_noempty/log/
See documentation for output descriptions - https://instrain.readthedocs.io/en/latest/
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$`
If I remove the 3 different samples that appear to be a problem (Cannot cluster genome 37F6.fasta; 14 of 3321 comaprisons involve no genomic overlap at all: see log for more) it does generate the dendogram, but I was wondering why those samples suppose a problem when they are correctly mapped and are present in all the output files
Also, is there a way to represent the similarty between samples that isn't the dendogram? With 82 different samples, it's not a very legible way of understanding the results, I was wondering if there's an easy way of generaint a double entry table/heatmap that depicts the similarity between the samples
Best,
Hi @marina-vn -
Yes- I agree with that many samples a dendrogram can become very unweildy! I would recommend making heatmaps from the genomewide-compare.tsv (https://instrain.readthedocs.io/en/latest/example_output.html#genomewide-compare-tsv) . The link above should make it pretty straightforward for you to do so, but please let me know if you run into trouble.
Best, Matt
I'm running the program "instrain Compare" (v. 1.8.0) and it only generates one file in the output directory (MG_mapped_only_iSp_COMPARE_comparisonsTable.tsv) and no figures.
The command that I'm running is:
inStrain compare -i BL090113_mapped_only_iSp/ BL090210_mapped_only_iSp/ BL090317_mapped_only_iSp/ BL090421_mapped_only_iSp/ BL090512_mapped_only_iSp/ BL090609_mapped_only_iSp/ BL090721_mapped_only_iSp/ BL090804_mapped_only_iSp/ BL090915_mapped_only_iSp/ BL091006_mapped_only_iSp/ BL091105_mapped_only_iSp/ BL091222_mapped_only_iSp/ BL100120_mapped_only_iSp/ BL100217_mapped_only_iSp/ BL100322_mapped_only_iSp/ BL100413_mapped_only_iSp/ BL100525_mapped_only_iSp/ BL100622_mapped_only_iSp/ BL100706_mapped_only_iSp/ BL100803_mapped_only_iSp/ BL100914_mapped_only_iSp/ BL101019_mapped_only_iSp/ BL101116_mapped_only_iSp/ BL101214_mapped_only_iSp/ BL110112_mapped_only_iSp/ BL110208_mapped_only_iSp/ BL110314_mapped_only_iSp/ BL110412_mapped_only_iSp/ BL110517_mapped_only_iSp/ BL110615_mapped_only_iSp/ BL110704_mapped_only_iSp/ BL110802_mapped_only_iSp/ BL110913_mapped_only_iSp/ BL111010_mapped_only_iSp/ BL111129_mapped_only_iSp/ BL111219_mapped_only_iSp/ BL120110_mapped_only_iSp/ BL120214_mapped_only_iSp/ BL120313_mapped_only_iSp/ BL120411_mapped_only_iSp/ BL120518_mapped_only_iSp/ BL120620_mapped_only_iSp/ BL120703_mapped_only_iSp/ BL120807_mapped_only_iSp/ BL120913_mapped_only_iSp/ BL121009_mapped_only_iSp/ BL121106_mapped_only_iSp/ BL121211_mapped_only_iSp/ BL130115_mapped_only_iSp/ BL130206_mapped_only_iSp/ BL130312_mapped_only_iSp/ BL130417_mapped_only_iSp/ BL130507_mapped_only_iSp/ BL130604_mapped_only_iSp/ BL130709_mapped_only_iSp/ BL130801_mapped_only_iSp/ BL130917_mapped_only_iSp/ BL131015_mapped_only_iSp/ BL131105_mapped_only_iSp/ BL131204_mapped_only_iSp/ BL140114_mapped_only_iSp/ BL140211_mapped_only_iSp/ BL140310_mapped_only_iSp/ BL140407_mapped_only_iSp/ BL140505_mapped_only_iSp/ BL140602_mapped_only_iSp/ BL140707_mapped_only_iSp/ BL140804_mapped_only_iSp/ BL140916_mapped_only_iSp/ BL141007_mapped_only_iSp/ BL141111_mapped_only_iSp/ BL141216_mapped_only_iSp/ BL150113_mapped_only_iSp/ BL150210_mapped_only_iSp/ BL150310_mapped_only_iSp/ BL150415_mapped_only_iSp/ BL150512_mapped_only_iSp/ BL150609_mapped_only_iSp/ BL150707_mapped_only_iSp/ BL150804_mapped_only_iSp/ BL150916_mapped_only_iSp/ BL151006_mapped_only_iSp/ BL151110_mapped_only_iSp/ BL151215_mapped_only_iSp/ -p 30 -d -o MG_mapped_only_iSp_COMPARE2
And the it seems to without any issue, but it doesn't generate the output that I need:
Any idea of what could be the issue and hot to fix it?
Best,
log.log