SegataLab / panphlan

PanPhlAn is a strain-level metagenomic profiling tool for identifying the gene composition of individual strains in metagenomic samples
MIT License
41 stars 6 forks source link

IndexError: list index out of range #16

Closed GaioTransposon closed 3 years ago

GaioTransposon commented 3 years ago

Hi guys,

Is this message familiar?

STEP 4: Define strain-specific gene-families presence/absence (1,-1,-2,-3 matrix, option --o_idx) [W] No DNA 1,2,3 index file has been written because no strain was detected.

STEP 5: Get presence/absence of gene-families (1,-1 matrix, option --o_matrix) Traceback (most recent call last): File "/shared/homes/12705859/miniconda3/envs/panphlan_env/bin/panphlan_profiling.py", line 763, in main() File "/shared/homes/12705859/miniconda3/envs/panphlan_env/bin/panphlan_profiling.py", line 738, in main sample2family2presence = get_genefamily_presence_absence(sample2family2dnaidx, sample_stats, avg_genome_length, args) File "/shared/homes/12705859/miniconda3/envs/panphlan_env/bin/panphlan_profiling.py", line 574, in get_genefamily_presence_absence families = sample2family2dnaidx[dna_samples[0]].keys() IndexError: list index out of range

my command was: panphlan_profiling.py -p /shared/homes/12705859/panphlan/Blautia_wexlerae/Blautia_wexlerae_pangenome.tsv -i map/ --o_matrix ./matrix_out/profile_Blautia_wexlerae --min_coverage 1 --left_max 1.70 --right_min 0.30

I have run panphlan_profiling.py before and never had this problem. Only difference is the way I downloaded the pangenomes (via panphlan_download earlier, while now via browser). Mapping worked fine. Files have content (from the size, over 20Mb each), which is why I wonder why no strain was detected

Thank you Dany

leonarDubois commented 3 years ago

Hello,

The warning in step 4 tells you that no sample in your input passed the thresholds you provided (--min_coverage 1 --left_max 1.70 --right_min 0.30 ). That does not specifically mean that you species is absent in the sample, but either that PanPhlAn limits are kind of reached. You could lower again the coverage thresholds (like --min_coverage 0.9 --left_max 2 --right_min 0.10 for example), but then your profiling results should be analyzed with care.

Also, keep in mind that depending on the species and the sample, a significant part could not pass the PanPhlAn analysis. I sometimes have more than half of the samples mapped that do not pass profiling.

Out of curiosity how many samples do you have in your input folder ?

GaioTransposon commented 3 years ago

Hi!

Yes I imagined that was the case... over 500 input samples (metagenomic). Do you think the fact that I used single reads (only forward reads) instead of concatenated forward and reverse (as was suggested) could have impacted this substantially?

On Sat, Oct 3, 2020, 00:14 Léonard Dubois notifications@github.com wrote:

Hello,

The warning in step 4 tells you that no sample in your input passed the thresholds you provided (--min_coverage 1 --left_max 1.70 --right_min 0.30 ). That does not specifically mean that you species is absent in the sample, but either that PanPhlAn limits are kind of reached. You could lower again the coverage thresholds (like --min_coverage 0.9 --left_max 2 --right_min 0.10 for example), but then your profiling results should be analyzed with care.

Also, keep in mind that depending on the species and the sample, a significant part could not pass the PanPhlAn analysis. I sometimes have more than half of the samples mapped that do not pass profiling.

Out of curiosity how many samples do you have in your input folder ?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/SegataLab/panphlan/issues/16#issuecomment-702758620, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKTFBBD5I5RU6EIBDGF3RKDSIXN5XANCNFSM4SBUEJVA .

leonarDubois commented 3 years ago

Hi !

Yes indeed, it's better to concatenate all the reads. Otherwise it could prevent a sample to pass the min_coverage threshold thus discarding the sample