dib-lab / dib_rotation

Metagenomics DIB-lab rotation project
https://dib-lab.github.io/dib_rotation/
BSD 3-Clause "New" or "Revised" License
3 stars 8 forks source link

KEGGdecoder input formating. #24

Open cErikson opened 3 years ago

cErikson commented 3 years ago

KEGGdecoder requires the following input format. note the underscore between genome id and read id.

genomeId_fastaId \t keggid

Otherwise KEGGdecoder thinks each entry is its own genome, and segfalts. The following code gets the appropriate format

cat GCA_001508995.1_ASM150899v1_protein_kofamscan.txt | sed -ne 's/.*/GCA001508995_&/p' > GCA_001508995.1_ASM150899v1_protein_kofamscan_prefix.txt  c
cat kofamscan_results.txt | sed -ne 's/.*/kofamscan_&/p' > kofamscan_results_prefix.txt 
cat *prefix.txt > kofamscan_res_prefix.txt