biocore / microprot

structural annotation pipeline for microbial genomes and metagenomes
BSD 3-Clause "New" or "Revised" License
1 stars 6 forks source link

cluster Micronota genes #21

Closed tkosciol closed 7 years ago

tkosciol commented 7 years ago

run CD-HIT on Micronota genes (Prodigal outputs) and calculate stats for different clustering thresholds.

write output to micronota_stats.md output structure:

date: DD-MM-YYYY micronota genomes: X0

name sequences
micronota_raw X1
micronota_100 X2
micronota_90 X3
micronota_70 X4

number indicates clustering threshold, e.g. micronota_90 means clustering at 90% sequence identity threshold.

tkosciol commented 7 years ago

data location: the faa file for each genome is located in barnacle:/projects/genome_annotation/201605/annot, each in its own dir (eg. G001281285/tmp/prodigal.faa). The genome ids (eg G001281285) and their info is in this table /home/evko1434/repophlan/repophlan_microbes_wscores.txt

tkosciol commented 7 years ago

@RNAer ok, maybe let's start by getting the number for micronota_raw and putting all predicted genes in one place.

tkosciol commented 7 years ago

done! data in /projects/microprot/data/micronota/clustering on Barnacle