adavi4 / SAI-10k-calc

MIT License
8 stars 0 forks source link

Error pre-processing transcript list using hg38 as reference #4

Closed andirisch closed 4 months ago

andirisch commented 5 months ago

Hi, I am facing an issue trying to pre-process my transcript gene list using hg38 as a reference:

Rscript download_tx.R --gene_list "gene_list.txt" --out_refseq "output_refseq.txt" --out_tx_spliceai "output_spliceai.txt" --ref hg38

When I don't change reference it works:

Rscript download_tx.R --gene_list "gene_list.txt" --out_refseq "output_refseq.txt" --out_tx_spliceai "output_spliceai.txt"

I am just trying to avoid lifting my variants all the time. I think there's an issue with a gene (see my gene list below) causing the following error:

Pre-processing transcript list
Error in .order_seqlevels(chrom_sizes[, "chrom"]) :
  !anyNA(m32) is not TRUE
Calls: ucscTableQuery ... .get_chrom_info_for_registered_UCSC_genome -> GET_CHROM_SIZES -> .order_seqlevels -> stopifnot
Execution terminated

I couldnt wrap my head around debugging that myself.

Gene List:

Gene RefSeq_ID BRCA2 NM_000059.4 RAD51C NM_058216.3 BMPR1A NM_004329.3 CHEK2 NM_007194.4 MSH2 NM_000251.3 SMAD4 NM_005359.6 NF2 NM_000268.4 STK11 NM_000455.5 MLH1 NM_000249.4 POLD1 NM_002691.4 PALB2 NM_024675.4 NF1 NM_001042492.3 ATM NM_000051.4 BARD1 NM_000465.4 TSC2 NM_000548.5 POLE NM_006231.4 PMS2 NM_000535.7 BRCA1 NM_007294.4 BRIP1 NM_032043.3 SMARCA4 NM_003072.5 TP53 NM_000546.6 CDH1 NM_004360.5 RAD51D NM_002878.4 PTEN NM_000314.8 APC NM_000038.6 MSH6 NM_000179.3

Cheers, Andi

adavi4 commented 4 months ago

Hi andirisch,

We are currently investigating this issue but have been unable to replicate the exact error.

In the meantime are you able to run the download_tx.R script with the --refseq-full option? A ncbiRefSeq table can be predownloaded from the UCSC table browser found here (https://genome.ucsc.edu/cgi-bin/hgTables) - select the relevant assembly, track as "NCBI RefSeq" and table as "RefSeq All (ncbiRefSeq)"