Closed Sanrrone closed 2 months ago
Hello,
This problem is a result of the .stb file in the profile step being different from the compare. Specifically, if you look at the genome_info.tsv file of your sample "HeP-1057-10" you will find the genome "22903", but this genome is not in the provided .stb file.
Best, Matt
Effectively the '22903' is in the genome_info file. However, the file .stb file (hg.tsv in my case) is the same in both executions (profile and compare).
$ grep -w 22903 /scratch/project_2007362/software/HumGutDB/hg.tsv
kraken:taxid|3020030|HumGut_20030_1 22903
So I do not understand why the error. is it because I am using numbers instead of strings in the bin column?
just in case, my parameters are:
#profile
inStrain profile --use_full_fasta_header -p $c -c 7 --min_scaffold_reads 7 -s $new/software/HumGutDB/hg.tsv --skip_plot_generation -o ${sname} $bam $new/software/HumGutDB/hg.fasta --database_mode
#compare
inStrain compare -o ${hep}_compare --skip_plot_generation -p $c -s $new/software/HumGutDB/hg.tsv -i $samples --database_mode -d
Hello,
Urg, I do worry it might be due to the numbers instead of strings for bin names. I thought I fixed a few years ago, but it's possible that I only fixed it for profile and not compare.
If you could please confirm that you're running the most recent version of inStrain, that would be ideal. If so, this is likely a number / string problem that I need to fix. As a workaround, adding a letter to your bin names (even just an "a" in front of all of them) should fix the issue.
Apologies, Matt
I installed it via conda
inStrain -h
...::: inStrain v1.8.0 :::...
Matt Olm and Alex Crits-Christoph. MIT License. Banfield Lab, UC Berkeley.
Choose one of the operations below for more detailed help. See https://instrain.readthedocs.io for documentation.
Example: inStrain profile -h
Main operations:
profile -> Create an inStrain profile (microdiversity analysis) from a mapping file
compare -> Compare multiple inStrain profiles (popANI, coverage_overlap, etc.)
Auxiliary operations:
check_deps -> Print a list of dependencies, versions, and whether they're working
parse_annotations -> Run a number of outputs based a table of gene annotations
quick_profile -> Quickly calculate coverage and breadth of a mapping using coverM
filter_reads -> Commands related to filtering reads from .bam files
plot -> Make figures from the results of "profile" or "compare"
other -> Other miscellaneous operations
Best, Sandro
Just to complete the issue, it is solved by adding an non-numeric name to the bins as you suggested.
thank you!
Dears, Along with greeting you, I got the following error related with the stb file:
my stb file (
-s /scratch/project_2007362/software/HumGutDB/hg.tsv
) is a two column tab separated file in the format scaffold\tbin and is the same file used in the profile step.I tried by reducing the amount of samples. But, still is the same error (with other contig bin) . what could I do?
thanks in advance, Sandro