alexcritschristoph / soil_popgen

Reproducible scripts and notebooks for 2019 paper on population genetics in metagenomes
GNU General Public License v3.0
14 stars 0 forks source link

empty genes.tsv or linkage.tsv when running gene_statistics.py #3

Closed palomo11 closed 4 years ago

palomo11 commented 5 years ago

Hi,

I'm running gene_statistics.py on the per-sample population profiles (same thing happens with study-wide population profiling). In some cases all SNVs.tsv, genes.tsv and linkage.tsv files are created and have information. However, in some cases, one of the 3 files (either the genes.tsv or the linkage.tsv, but mainly genes.tsv) only have the header:

cat genes.tsv
gene    coverage    pi  length_with_coverage    sample
cat linkage.tsv
    index

Do you know why is this happening?

Thanks in advance.

alexcritschristoph commented 5 years ago

Hi- Is the script crashing at all? Do you see it print lines like calculating nucleotide diversity Determining function of SNPs and Updating linkage table? This could also occur if you've passed it the wrong prodigal FAA file, make sure it matches the FASTA you originally ran on Thanks, Alex

palomo11 commented 5 years ago

Hi,

It does not crash. I think it is a problem with the header of the fasta and fna files. For some of the gneomes they don't match. I think that's the reason. Also, even if the SNV are detected and the file created, all the SNV are I (instead of N or S) because no genes are recognized.

I will change the headers and run it again.

However, this is not the case for the linkage. Out of 116 profiles, only in very very few cases the file is empty. In few of them, it only contains 2-4 lines (usually the same MAG in different samples). This happens both when analysis the whole genome or the genes file.

Thanks,

Alex

palomo11 commented 5 years ago

After matching both headers, the genes.tsv files are properly created.

Thanks!

Any idea about the linkage?

alexcritschristoph commented 5 years ago

For a linkage file, it could be empty if SNPs are very rare and tend to be very far from each other, or if you're below the default minimum coverage of both snps required to calculate linkage - I think it might be built in as 20x or 30x (which means that you'll require 50x+ to hit this