VCCRI / Sierra

Discover differential transcript usage from polyA-captured single cell RNA-seq data
GNU General Public License v3.0
49 stars 17 forks source link

Error when running CountPeaks #48

Open idupanloup opened 2 years ago

idupanloup commented 2 years ago

I get an error when using the CountPeaks function for one of my samples:

There are 14963 whitelist barcodes.
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
There are 7  sites
Doing counting for each site...
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function 'writeMM' for signature '"NULL"'
In addition: Warning message:
In .get_cds_IDX(mcols0$type, mcols0$phase) :
  The "phase" metadata column contains non-NA values for features of type stop_codon. This information
  was ignored.

Do you know what can be the problem ? (i saw a similar issue posted by another user, but did not find the solution) Thanks !

rj-patrick commented 2 years ago

Hi @idupanloup,

Apologies for the slow response. This error occurs when there are no UMI counts identified in your data for the set of provided peaks and a 'NULL' matrix is returned. As you are only inputting 7 peaks, based on your previous post, I'd say that one of your samples simply has no coverage over these peak coordinates, which is why the error is being returned.

Cheers, Ralph

rbarbieri86 commented 1 year ago

Hello, sorry to piggyback an older ticket but I have a similar error occurring:

There are 2096 whitelist barcodes.
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
There are 37451  sites
Doing counting for each site...
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function 'writeMM' for signature '"NULL"'
In addition: Warning message:
In .get_cds_IDX(mcols0$type, mcols0$phase) :
  The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored.

The BAM file used is the extracted chromosome 5 of a bigger BAM I was hoping to use, I thought this would exclude a memory issue. Also the appropriate flags should be present, as the original BAM file was generated with this STARsolo command:

STAR --runThreadN 8 --genomeDir ~/GencodeM29_star/ --soloType Droplet --soloCBwhitelist 737K-august-2016.txt --soloCellFilter EmptyDrops_CR --outSAMattributes NH HI AS nM GX GN CB UB sS sQ sM NM --outSAMtype BAM SortedByCoordinate --readFilesCommand zcat --readFilesIn ERR4898566_2.fastq.gz ERR4898566_1.fastq.gz --outFileNamePrefix starsolo_out/ERR4898566/ERR4898566_

Is there any other possible reason this error occurs?

Thank you very much in advance!

Bridream commented 1 year ago

Hi, @rbarbieri86,

Maybe your whitelist is wrong. Although you have called out many peak sites, but none of their barcodes can match the whitelist so you get an empty peak x cell matirx after running CountPeaks function.

rbarbieri86 commented 1 year ago

Hi Bridream,

Ok, that would be weird for the Sierra whitelist as I used the barcodes.tsv files as indicated. However I could double check the STARsolo whitelist as that one was downloaded following the manual's indications if I remember correctly. I will look into that and come back to you

rbarbieri86 commented 1 year ago

Hi Bridream,

I have indeed used the wrong whitelist when aligning with STARsolo, substituted with the correct one (3M-february-2018.txt.gz). However the error is still there, exactly as before:

There are 2096 whitelist barcodes.
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
There are 37451  sites
Doing counting for each site...
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function 'writeMM' for signature '"NULL"'
In addition: Warning message:
In .get_cds_IDX(mcols0$type, mcols0$phase) :
  The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored.

I am working on publicly available data which I could analyze with another tool (MAAPER). Any idea of why this is still happening?

Bridream commented 1 year ago

@rbarbieri86

Do you find the whitelist somewhere online? Maybe you can try to generate your whitelist using your raw data, so in this way your whitelist must be right. If you are working on publicly available data, you cac just extract the barcodes whitelist from the processed data, for example, the gene-by-cell matrix.

rbarbieri86 commented 1 year ago

The whitelist is provided on the 10X website and also linked in the STARsolo guide on GitHub. I can try using the barcodes.tsv as whitelists after removing the "-1" at the end I think. I have also noticed some discrepancy between the data and their description too. Apparently the data is labeled 10X 3'v3 version but the barcodes are shorter (26 instead of 28), which seems to indicate a v2 kit. However I was using the v2 whitelist beforehand and still had the same issue. Looking into STARsolo logs, it seems there is indeed a problem there as a lot of cells do not have a valid barcode. I will try a few runs more with different parameters and get back to you.

Thanks for your assistance by the way.

rbarbieri86 commented 1 year ago

Hello, just thought of giving an update.

There were indeed issues with the STARsolo step as the Fastq files used were partially corrupted. At the moment, after a few re-downloads and confirming that the kit used was v2 I have tried once more to run CountPeaks which resulted in the same error as above. I am relatively sure that the input should be fine now as the STARsolo logs seem OK, so I will try my luck with MAAPER once more.