almlab / GMbC_HGTs

Scripts and data resources from the HGT analysis of GMbC isolate genomes
5 stars 1 forks source link

Text mining scripts for the assignment of genes #1

Closed wshuai294 closed 2 years ago

wshuai294 commented 2 years ago

Hello,

The valuable paper "Elevated rates of horizontal gene transfer in the industrialized human microbiome" mentioned using text mining for the assignment of genes into phage, plasmid, transposons, and antibiotic resistance.

I wonder if you could provide the gene assignment scripts publicly? That's would be very helpful.

Thank you very much.

Best, WANG Shuai

mgroussi commented 2 years ago

Hey Wang Shuai, of course! So the first step was to annotate genes with different tools and databases, as described in the paper (e.g. Resfams, Eggnog, etc). From these annotations, we mined inferred Eggnog/interproscan annotations for the presence of the terms listed below, using awk commands. This helped assigning big functional categories (e.g. Antibiotic resistance). For this, we also obviously used annotations and homology information found by each specific database/tool (e.g. Resfam family). Hope this helps!

Phage:

/capsid|phage|tail|head|tape measure|antitermination/

Plasmid:

/resolv|relax|conjug|trb|plasmid|type IV|toxin|chromosome partitioning|chromosome segregation|Resolv|Relax|Conjug|Trb|Plasmid|Type IV|Toxin|Chromosome partitioning|Chromosome segregation/

Transposon:

/transpos|insertion|resolv|Tra[A-Z]|Tra[0-9]|IS[0-9]|conjugate transposon|Transpos|Insertion|Resolv|Tra[A-Z]|Tra[0-9]|IS[0-9]|Conjugate transposon/

Antiobitic.Resistance

/multidrug|azole resistance|antibiotic resistance|TetR|tetracycline resistance|VanZ|betalactam|beta-lactam|antimicrob|lantibio|Multidrug|Azole resistance|Antibiotic resistance|TetR|tetr|tetR|Tetracycline resistance|VanZ|vanz|vanZ|VANZ|Betalactam|Beta-lactam|Antimicrob|Lantibio/

wshuai294 commented 2 years ago

It helps a lot! Thank you very much.