OliveiraDS-hub / ChimeraTE

A pipeline to detect chimeric transcripts derived from genes and transposable elements.
GNU General Public License v3.0
21 stars 6 forks source link

can't find script of mode2 (transcripts_IDs_NCBI.sh) #3

Closed xixifa closed 1 year ago

xixifa commented 1 year ago

Dear Daniel, I am studying ChimeraTE, which is useful for me, but I found that github lacks a document about mode2, transcripts_IDs_NCBI.sh. Could you please upload this file sometime? I would be very grateful for your help.

Best wishes May

OliveiraDS-hub commented 1 year ago

Dear May, sorry by the delayed answer. We are currently working in the review of the pipeline.

I've uploaded the script to make IDs conversion from NCBI's pattern to ChimeraTE Mode 2. Remember you need to download from NCBI the "rna.fna" file from the refseq repository of your species of interest. For example, you can download it from the human genome at https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.40_GRCh38.p14/GCF_000001405.40_GRCh38.p14_rna.fna.gz

The first two transcript from this file are:

NM_000014.6 Homo sapiens alpha-2-macroglobulin (A2M), transcript variant 1, mRNA NM_000015.3 Homo sapiens N-acetyltransferase 2 (NAT2), mRNA

You need exactly this file for your species of interest to run the script.

Then:

bash transcript_IDs_NCBI.sh --transcripts rna.fna --output file.fa

You can use also bash transcript_IDs_NCBI.sh --help

I'm closing this issue, let me know if you need anything else in the next future.

Cheers