biocompibens / ALFA

ALFA: Annotation Landscape for Aligned Reads
MIT License
14 stars 2 forks source link

Error: at least one feature in the annotation file doesn't have a biotype description. ALFA won't be able to work robustly. #7

Open zztin opened 3 years ago

zztin commented 3 years ago

Hi,

Thank you for the package! I am keen to use solve cell-free DNA functional annotation with ALFA. In the first step, I need to provide an annotation file (GTF format with biotypes) for my reference genome (hg38). However, the gtf downloaded from NCBI (see link below) and USCS both raise this complaint. I would like to hear your suggestion where can I get the correct format of gtf (and recommanded tracks) for this purpose.

The details of the two sources I tried:

  1. NCBI: https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.39/ (top-right botton: Download Assembly--> RefSeq --> File Type: Genomic GTF)
  2. UCSC: http://genome.ucsc.edu/cgi-bin/hgTables Screen Shot 2021-01-08 at 18 25 36

both of these returns: Error: at least one feature in the annotation file doesn't have a biotype description. ALFA won't be able to work robustly.

Best, Li-Ting

zztin commented 3 years ago

Hi, I solved the question for now by downloading GTF from Gencode. I switched the key-value pair in column 9 key: "gene_type" to "gene_biotype" After this customized alteration, it works with ALFA. I'm not sure if this is the correct thing to do, but it seems like gene_type indeed referred to biotype in the context (based on the data format description).

I would still appreciate your recommendations on where you would download the gtf tracks while working with human data.

Another related question, Is it possible to use ALFA to determine if the DNA fragments coverage on Alu element / LINE / LTR and other repetitive elements? In this case, can I alter the "gene_biotype" into other custom tags such as "mobile_element_type" in ALFA? Would the normalization stay intact like this?

Thank you!

mbahin commented 3 years ago

Hi,

First of all, I must say that the package is not maintained anymore (sorry). Though it should work as it was at the time it was developed.

Regarding the annotation file, I used to download it from Ensembl. For example, for Homo Sapiens, I was getting the file from here. However, your trick should be ok since the "gene_type" is described as what we use as the biotype (and I check the list of gene_type which is concordant with the list of biotypes we used to work with).

Cheers, Mathieu