crisprVerse / crisprDesign

Comprehensive design of CRISPR gRNAs for nucleases and base editors
MIT License
16 stars 5 forks source link

Error building gene annotation object with GFF for CHM13v2.0 #26

Closed nicodemus88 closed 5 months ago

nicodemus88 commented 11 months ago

Hi developers, I am having some issues with making a gene annotation object for CMH13 using the GFF file obtained here: https://github.com/marbl/CHM13

Here's the error message I got:

> txdb <- getTxDb(organism = "Homo sapiens", file = "./ref_genome/chm13v2.0_RefSeq_Liftoff_v5.1.gff3")
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Warning messages:
1: In .extract_transcripts_from_GRanges(tx_IDX, gr, mcols0$type, mcols0$ID,  :
  some transcripts have no "transcript_id" attribute ==> their name ("tx_name" column
  in the TxDb object) was set to NA
2: In .extract_transcripts_from_GRanges(tx_IDX, gr, mcols0$type, mcols0$ID,  :
  the transcript names ("tx_name" column in the TxDb object) imported from the
  "transcript_id" attribute are not unique
3: In .find_exon_cds(exons, cds) :
  The following transcripts have exons that contain more than one CDS (only the first
  CDS was kept for each exon): NM_001134939.1, NM_001172437.2, NM_001184961.1,
  NM_001301020.1, NM_001301302.1, NM_001301371.1, NM_002537.3, NM_004152.3,
  NM_015068.3, NM_016178.2
> grList <- TxDb2GRangesList(txdb)
'select()' returned many:many mapping between keys and columns
Error in `rownames<-`(`*tmp*`, value = names(x)) : 
  missing values not allowed in rownames
In addition: Warning message:
In .set_group_names(ans, use.names, txdb, "tx") :
  some group names are NAs or duplicated

Could you please advice on how to proceed? I don't see any duplicated row names on the TxDB file, so very unsure what the error message means...

Additionally, does the warning messages affect the generation of the GRanges object? There are some NAs and duplicated tx_name in the TxDB object.

Thank you!

Jfortin1 commented 11 months ago

Thank you @nicodemus88 for reporting this -- we'll get back to you soon

nicodemus88 commented 11 months ago

@Jfortin1 Thanks! Could I get an update on this?

ltHobbes commented 6 months ago

Hi @nicodemus88, thank you for bringing this to our attention. The TxDb object lacks some annotations, which ultimately caused the error. I've pushed a fix, which you can access for now using this version of crisprDesign:

install.packages("devtools")
library(devtools)
devtools::install_github("crisprVerse/crisprDesign@issue26")

The warning messages should not affect the GRanges object unless you wish to subset it by name. In that case, subsetting by a duplicated (or NA) name can give unexpected behavior. Of course, you can always manually set the names after the object is created.