GoekeLab / sg-nex-data

Nanopore RNA-Seq data from the Singapore Nanopore-Expression Project
97 stars 24 forks source link

Questions about sequin used #19

Open Kyung-TaeLee opened 2 years ago

Kyung-TaeLee commented 2 years ago

Hi, thank you for providing such a wonderful resources to the community. I am trying to analyze the data to compare performance of multiple quantification program. To do that, I downloaded cDNA PCR sequencing data (SQK-PCS109) and short-read data with sequin included. I first ran Anaquin (sequin analyzing softwares) to check the consistency of expression between expected and estimated (Kallisto is used by Anaquin). However, correlation was very poor (around 0.1). Then I realized that Anaquin software provides sequin 2.4 version but in the excel file provided in the original manuscript, it was stated that sequin version 1 was used. Do they differ in terms of transcripts used and their concentration? I tried to find the sequin version 1 reference file (decoy chromosome, gene annotation in GTF) but couldn't find any. I visited the sequinstandard web site and tried to access the resources in the webpage but can't ( I have to log in to access the files but they won't let me register. I don't know why). Could provide reference files for the sequin used in the study (also the file that contains expected concentration)? Thank you and have a nice day

alexyfyf commented 12 months ago

Would like to know as well. I found the reformatted gtf contains bambu generated transcripts with sequin, but not sure if these are complete sequin transcripts.

cying111 commented 8 months ago

Hi,

Sorry for getting back lately.

For the sequin reference, you may download the fasta and gtf file using the links below:

gtf file: http://sg-nex-data.s3.amazonaws.com/data/annotations/gtf_file/hg38_sequins_SIRV_ERCCs_longSIRVs_v5_reformatted.gtf this is the complete gtf file that we used for our analysis, for your case, you can subset the gtf file to only sequin annotations, filtering by either sequin gene or transcript names fasta file: http://sg-nex-data.s3.amazonaws.com/data/annotations/genome_fasta/hg38_sequins_SIRV_ERCCs_longSIRVs.fa similarly for the fasta file, you can also extract only the sequences for sequins by looking at chrIS only

For the sequin concentration, we have recently added the concentration file that we have used for the original manuscript: https://github.com/GoekeLab/sg-nex-data/blob/master/docs/RNAsequins_MixA.xlsx

Let me know you still have issues related to this.

Thank you