Differences between `kos.parsed.counts.tsv` files and my own script for summing counts per KEGG KO

NBISweden / nbis-meta

A snakemake workflow for metagenomic projects

MIT License

13 stars 9 forks source link

I'm finding differences between kos.parsed.counts.tsv (in results/annotation/$SAMPLE/) and the results of my own script for summing counts annotated per KEGG KO. My script revolves around the following (pandas) lines:

kegg_ko_sr = merged_df['KEGG_ko'].str.replace('ko:','').str.split(',')
kegg_ko_df = merged_df.assign(KEGG_ko=kegg_ko_sr).explode('KEGG_ko')

group_kegg_ko = (kegg_ko_df.groupby('KEGG_ko', as_index = False)
                            [['counts','length']]
                              .agg('sum')
                    )

Do you know what could be causing the difference? Is there any additional functionality in the code creating kos.parsed.counts.tsv which could be the cause?

NBISweden / nbis-meta

Differences between `kos.parsed.counts.tsv` files and my own script for summing counts per KEGG KO #36