NBChub / bgcflow

Snakemake workflow for the analysis of biosynthetic gene clusters across large collections of genomes (pangenomes)
https://github.com/NBChub/bgcflow/wiki
MIT License
35 stars 9 forks source link

genome_id = sequence in df_arts #357

Open Sam-Will opened 1 month ago

Sam-Will commented 1 month ago

For a lot of genome_id values present in df_arts_allhits_as-7.1.0.csv there is either 'sequence', 'extrachromosomal' or 'genome'. It appears to only happen when duplication column is TRUE.

E.g.

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

NBC_00193__TIGR00558__1__1721 | TIGR00558 | NBC_00193 | phzG | phenazine biosynthesis FMN-dependent oxidase PhzG | 2038838 | 2039459 | 1 | scaffold_1 | 1721 | CDS | 11 | NBC_00193__arts_core__TIGR00558 | TRUE | TRUE | FALSE | NBC_00193_4.region011 -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- NBC_00193__TIGR00945__3__7070 | TIGR00945 | sequence | tatC | twin-arginine translocase subunit TatC | 121892 | 122693 | 1 | scaffold_3 | 7070 | CDS | 11 | NBC_00193__arts_core__TIGR00945 | TRUE | FALSE | FALSE | NBC_00193_6.region002 NBC_00193__TIGR01534__1__5156 | TIGR01534 | NBC_00193 | gap | type I glyceraldehyde-3-phosphate dehydrogenase | 5831800 | 5832799 | 1 | scaffold_1 | 5156 | CDS | 11 | NBC_00193__arts_core__TIGR01534 | TRUE | TRUE | TRUE | NBC_00193_4.region015 NBC_00193__TIGR02393__1__511 | TIGR02393 | sequence | - | RNA polymerase sigma factor | 637782 | 639393 | -1 | scaffold_1 | 511 | CDS | 11 | NBC_00193__arts_core__TIGR02393 | TRUE | FALSE | FALSE | NBC_00193_4.region007 NBC_00193__TIGR02504__1__538 | TIGR02504 | NBC_00193 | - | vitamin B12-dependent ribonucleotide reductase | 672375 | 675270 | -1 | scaffold_1 | 538 | CDS | 11 | NBC_00193__arts_core__TIGR02504 | FALSE | FALSE | FALSE | NBC_00193_4.region007 NBC_00193__TIGR03156__1__547 | TIGR03156 | NBC_00193 | hflX | GTPase HflX | 690034 | 691543 | -1 | scaffold_1 | 547 | Core | 11 | NBC_00193__arts_core__TIGR03156 | FALSE | FALSE | FALSE | NBC_00193_4.region007 NBC_00193__TIGR03446__1__1401 | TIGR03446 | NBC_00193 | mca | mycothiol conjugate amidase Mca | 1702115 | 1702997 | -1 | scaffold_1 | 1401 | Core | 11 | NBC_00193__arts_core__TIGR03446 | FALSE | FALSE | FALSE | NBC_00193_4.region010 NBC_00193__TIGR03451__2__6223 | TIGR03451 | NBC_00193 | - | S-(hydroxymethyl)mycothiol dehydrogenase | 748963 | 750049 | -1 | scaffold_2 | 6223 | CDS | 11 | NBC_00193__arts_core__TIGR03451 | FALSE | TRUE | FALSE | NBC_00193_7.region004

If you need more information, don't hesitate to ask.

Thanks, Sam

matinnuhamunada commented 1 month ago

Thanks, this seems to be related to #330