RenaeAtkinson commented 1 year ago

Directory ./ already exists. Writing into existing directory.. mkdir: cannot create directory ‘.//SCASA_testscasaHNVC02_20230414001259/’: File exists

Preparing for alignment.. Indexing reference.. Directory .//SCASA_testscasaHNVC02_20230414001259/0PRESETS//REF_INDEX/ already exists. Writing into existing directory.. Version Info: ### PLEASE UPGRADE SALMON ###

A newer version of salmon with important bug fixes and improvements is available.

The newest version, available at https://github.com/COMBINE-lab/salmon/releases contains new features, improvements, and bug fixes; please upgrade at your earliest convenience.

Sign up for the salmon mailing list to hear about new versions, features and updates at: https://oceangenomics.com/subscribe [2023-04-14 00:12:59.520] [jLog] [warning] The salmon index is being built without any decoy sequences. It is recommended that decoy sequence (either computed auxiliary decoy sequence or the genome of the organism) be provided during indexing. Further details can be found at https://salmon.readthedocs.io/en/latest/salmon.html#preparing-transcriptome-indices-mapping-based-mode. [2023-04-14 00:12:59.520] [jLog] [info] building index out : .//SCASA_testscasaHNVC02_20230414001259/0PRESETS//REF_INDEX/ [2023-04-14 00:12:59.527] [puff::index::jointLog] [info] Running fixFasta

[Step 1 of 4] : counting k-mers

[2023-04-14 00:13:07.009] [puff::index::jointLog] [warning] Removed 236 transcripts that were sequence duplicates of indexed transcripts. [2023-04-14 00:13:07.010] [puff::index::jointLog] [warning] If you wish to retain duplicate transcripts, please use the `--keepDuplicates` flag [2023-04-14 00:13:07.012] [puff::index::jointLog] [info] Replaced 4 non-ATCG nucleotides [2023-04-14 00:13:07.012] [puff::index::jointLog] [info] Clipped poly-A tails from 11,186 transcripts wrote 76267 cleaned references [2023-04-14 00:13:07.789] [puff::index::jointLog] [info] Filter size not provided; estimating from number of distinct k-mers [2023-04-14 00:13:10.356] [puff::index::jointLog] [info] ntHll estimated 85097693 distinct k-mers, setting filter size to 2^31 Threads = 2 Vertex length = 31 Hash functions = 5 Filter size = 2147483648 Capacity = 2 Files: .//SCASA_testscasaHNVC02_20230414001259/0PRESETS//REF_INDEX/ref_k31_fixed.fa

Round 0, 0:2147483648 Pass Filling Filtering 1 36 77 2 5 0 True junctions count = 277411 False junctions count = 422333 Hash table size = 699744 Candidate marks count = 4646414

Reallocating bifurcations time: 0 True marks count: 3337299 Edges construction time: 6

Distinct junctions = 277411

TwoPaCo::buildGraphMain:: allocated with scalable_malloc; freeing. TwoPaCo::buildGraphMain:: Calling scalable_allocation_command(TBBMALLOC_CLEAN_ALL_BUFFERS, 0); allowedIn: 12 Max Junction ID: 318881 seen.size():2551057 kmerInfo.size():318882 approximateContigTotalLength: 66002535 counters for complex kmers: (prec>1 & succ>1)=26025 | (succ>1 & isStart)=63 | (prec>1 & isEnd)=73 | (isStart & isEnd)=10 contig count: 433949 element count: 98078572 complex nodes: 26171

of ones in rank vector: 433948

[2023-04-14 00:15:32.167] [puff::index::jointLog] [info] Starting the Pufferfish indexing by reading the GFA binary file. [2023-04-14 00:15:32.167] [puff::index::jointLog] [info] Setting the index/BinaryGfa directory .//SCASA_testscasaHNVC02_20230414001259/0PRESETS//REF_INDEX size = 98078572

| Loading contigs | Time = 47.228 ms

size = 98078572

| Loading contig boundaries | Time = 25.94 ms

Number of ones: 433948 Number of ones per inventory item: 512 Inventory entries filled: 848 433948 [2023-04-14 00:15:32.408] [puff::index::jointLog] [info] Done wrapping the rank vector with a rank9sel structure. [2023-04-14 00:15:32.412] [puff::index::jointLog] [info] contig count for validation: 433,948 [2023-04-14 00:15:32.736] [puff::index::jointLog] [info] Total # of Contigs : 433,948 [2023-04-14 00:15:32.736] [puff::index::jointLog] [info] Total # of numerical Contigs : 433,948 [2023-04-14 00:15:32.756] [puff::index::jointLog] [info] Total # of contig vec entries: 3,427,302 [2023-04-14 00:15:32.756] [puff::index::jointLog] [info] bits per offset entry 22 [2023-04-14 00:15:32.870] [puff::index::jointLog] [info] Done constructing the contig vector. 433949 [2023-04-14 00:15:33.302] [puff::index::jointLog] [info] # segments = 433,948 [2023-04-14 00:15:33.303] [puff::index::jointLog] [info] total length = 98,078,572 [2023-04-14 00:15:33.331] [puff::index::jointLog] [info] Reading the reference files ... [2023-04-14 00:15:34.093] [puff::index::jointLog] [info] positional integer width = 27 [2023-04-14 00:15:34.093] [puff::index::jointLog] [info] seqSize = 98,078,572 [2023-04-14 00:15:34.093] [puff::index::jointLog] [info] rankSize = 98,078,572 [2023-04-14 00:15:34.093] [puff::index::jointLog] [info] edgeVecSize = 0 [2023-04-14 00:15:34.093] [puff::index::jointLog] [info] num keys = 85,060,132 for info, total work write each : 2.331 total work inram from level 3 : 4.322 total work raw : 25.000 [Building BooPHF] 100 % elapsed: 0 min 8 sec remaining: 0 min 0 sec Bitarray 445693632 bits (100.00 %) (array + ranks ) final hash 0 bits (0.00 %) (nb in final hash 0) [2023-04-14 00:15:41.958] [puff::index::jointLog] [info] mphf size = 53.1308 MB [2023-04-14 00:15:42.025] [puff::index::jointLog] [info] chunk size = 49,039,286 [2023-04-14 00:15:42.025] [puff::index::jointLog] [info] chunk 0 = [0, 49,039,286) [2023-04-14 00:15:42.025] [puff::index::jointLog] [info] chunk 1 = [49,039,286, 98,078,542) [2023-04-14 00:15:53.934] [puff::index::jointLog] [info] finished populating pos vector [2023-04-14 00:15:53.934] [puff::index::jointLog] [info] writing index components [2023-04-14 00:15:54.455] [puff::index::jointLog] [info] finished writing dense pufferfish index [2023-04-14 00:15:54.494] [jLog] [info] done building index Finnished indexing reference.. Begins pseudo-alignment.. nohup: redirecting stderr to stdout Congratulations! Pseudo-alignment has completed in 30 seconds! Scasa quantification has started.. Begin Scasa quantification for sample SRR10340946.. Error in file(con, "r") : cannot open the connection Calls: readLines -> file In addition: Warning message: In file(con, "r") : cannot open file './/SCASA_testscasaHNVC02_20230414001259/1ALIGN//SRR10340946_alignout/alevin/bfh.txt': No such file or directory Execution halted Loading required package: iterators Loading required package: parallel Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection Calls: load -> readChar In addition: Warning message: In readChar(con, 5L, useBytes = TRUE) : cannot open compressed file '/network/rit/lab/conklinlab/Renae/HNVC/HNVC02/SRR10340946/SCASA_testscasaHNVC02_20230414001259/2QUANT/SRR10340946_quant/Sample_eqClass.RData', probable reason 'No such file or directory' Execution halted Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection Calls: load -> readChar In addition: Warning message: In readChar(con, 5L, useBytes = TRUE) : cannot open compressed file './/SCASA_testscasaHNVC02_20230414001259/2QUANT//SRR10340946_quant//scasa_isoform_expression.RData', probable reason 'No such file or directory' Execution halted Congratulations! Scasa single cell RNA-Seq transcript quantification has completed in 30 seconds! All done!

nghiavtr commented 1 year ago

Hi @RenaeAtkinson,

Thank you for using Scasa in your research.

The error is at the mapping step of alevin, likely it can not find out the input fastq files. Please check if the file names if they are in the right format. It is noted that the names of fastq files should contain "R1" and "R2", please see the details here: https://github.com/eudoraleer/scasa/wiki#6-input-fastq-files

Best, Nghia

RenaeAtkinson commented 1 year ago

Hi Nghia,

This is my code:

scasa –project HNVC02 \ --mapper salmon_alevin \ --align YES \ --quant YES \ --in /network/rit/lab/conklinlab/Renae/HNVC/HNVC02/SRR10340946/ \ --fastq SRR10340946_R1.fastq,SRR10340946_R2.fastq\ --out /network/rit/lab/conklinlab/Renae/SCASA/HNVC02/ \ --ref /network/rit/lab/conklinlab/Renae/SCASA/refMrna.fa \ --whitelist /network/rit/lab/conklinlab/Renae/HNVC/V2/737K-august-2016.txt \ --tech 10xv2 \ --nthreads 32 \ --index YES \ --xmatrix alevin

Renae Sent from Mailhttps://go.microsoft.com/fwlink/?LinkId=550986 for Windows

From: Trung Nghia @.> Sent: Friday, April 14, 2023 6:43 AM To: @.> Cc: @.>; @.> Subject: Re: [eudoraleer/scasa] So many Error messages: please help (Issue #8)

Hi @RenaeAtkinsonhttps://github.com/RenaeAtkinson,

Thank you for using Scasa in your research.

The error is at the mapping step of alevin, likely it can not find out the input fastq files. Please check if the file names if they are in the right format. It is noted that the names of fastq files should contain "R1" and "R2", please see the details here: https://github.com/eudoraleer/scasa/wiki#6-input-fastq-files

Best, Nghia

— Reply to this email directly, view it on GitHubhttps://github.com/eudoraleer/scasa/issues/8#issuecomment-1508312284, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AT4ASO2U2E74X6ZZZTJQ6XTXBES6RANCNFSM6AAAAAAW55ORHY. You are receiving this because you were mentioned.Message ID: @.***>

nghiavtr commented 1 year ago

hi @RenaeAtkinson,

I don't see a clear issue in your command except "--project" instead of "–project". Most default values parameters are used in your command, so can you try again with the shorter version below:

scasa --in /network/rit/lab/conklinlab/Renae/HNVC/HNVC02/SRR10340946/ \ --fastq SRR10340946_R1.fastq,SRR10340946_R2.fastq\ --out /network/rit/lab/conklinlab/Renae/SCASA/HNVC02/ \ --ref /network/rit/lab/conklinlab/Renae/SCASA/refMrna.fa \ --whitelist /network/rit/lab/conklinlab/Renae/HNVC/V2/737K-august-2016.txt \ --tech 10xv2 \ --nthreads 32

Best, Nghia

RenaeAtkinson commented 1 year ago

Hi Nghia,

Any idea how I can fix these errors I am getting? Error in file(con, "r") : cannot open the connection Calls: readLines -> file In addition: Warning message: In file(con, "r") : cannot open file '/network/rit/lab/conklinlab/Renae/SCASA/HNVC02//SCASA_My_Project_20230418104957/1ALIGN//SRR10340946_alignout/alevin/bfh.txt': No such file or directory Execution halted Loading required package: iterators Loading required package: parallel Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection Calls: load -> readChar In addition: Warning message: In readChar(con, 5L, useBytes = TRUE) : cannot open compressed file '/network/rit/lab/conklinlab/Renae/SCASA/HNVC02/SCASA_My_Project_20230418104957/2QUANT/SRR10340946_quant/Sample_eqClass.RData', probable reason 'No such file or directory' Execution halted Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection Calls: load -> readChar In addition: Warning message: In readChar(con, 5L, useBytes = TRUE) : cannot open compressed file '/network/rit/lab/conklinlab/Renae/SCASA/HNVC02//SCASA_My_Project_20230418104957/2QUANT//SRR10340946_quant//scasa_isoform_expression.RData', probable reason 'No such file or directory' Execution halted

Best, Renae Sent from Mailhttps://go.microsoft.com/fwlink/?LinkId=550986 for Windows

From: Trung Nghia @.> Sent: Friday, April 14, 2023 12:59 PM To: @.> Cc: @.>; @.> Subject: Re: [eudoraleer/scasa] So many Error messages: please help (Issue #8)

hi @RenaeAtkinsonhttps://github.com/RenaeAtkinson,

I don't see a clear issue in your command except "--project" instead of "–project". Most default values parameters are used in your command, so can you try again with the shorter version below:

scasa --in /network/rit/lab/conklinlab/Renae/HNVC/HNVC02/SRR10340946/ --fastq SRR10340946_R1.fastq,SRR10340946_R2.fastq --out /network/rit/lab/conklinlab/Renae/SCASA/HNVC02/ --ref /network/rit/lab/conklinlab/Renae/SCASA/refMrna.fa --whitelist /network/rit/lab/conklinlab/Renae/HNVC/V2/737K-august-2016.txt --tech 10xv2 --nthreads 32

Best, Nghia

— Reply to this email directly, view it on GitHubhttps://github.com/eudoraleer/scasa/issues/8#issuecomment-1508961530, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AT4ASOYKNZYBAYATRG7LL6LXBF7ADANCNFSM6AAAAAAW55ORHY. You are receiving this because you were mentioned.Message ID: @.***>

nghiavtr commented 1 year ago

Hi, The error indicates that the alignment by Alevin has not been performed. I am thinking of the reason that the input filename is not correct, but it is so weird because likely it is not.

Can you try to test the issue by renaming SRR10340946_R1.fastq by Sample_01_S1_L001_R1_001.fastq and SRR10340946_R2.fastq by Sample_01_S1_L001_R2_001.fastq as in the sample files of Scasa

Another possibility is that R1 and R2 files do not contain the correct information (one for sequence content and another for barcode+UMI), in that case we just switch the file name.

Let try and please let me know if any of these ways work, thanks!

Nghia

RenaeAtkinson commented 1 year ago

I had tried that naming convention before and that did not work.

The names are according to the split files downloaded from SRA

Renae

Sent from Mailhttps://go.microsoft.com/fwlink/?LinkId=550986 for Windows

From: Trung Nghia @.> Sent: Tuesday, April 18, 2023 11:14 AM To: @.> Cc: @.>; @.> Subject: Re: [eudoraleer/scasa] So many Error messages: please help (Issue #8)

Hi, The error indicates that the alignment by Alevin has not been performed. I am thinking of the reason that the input filename is not correct, but it is so weird because likely it is not.

Can you try to test the issue by renaming SRR10340946_R1.fastq by Sample_01_S1_L001_R1_001.fastq and SRR10340946_R2.fastq by Sample_01_S1_L001_R2_001.fastq as in the sample files of Scasa

Another possibility is that R1 and R2 files do not contain the correct information (one for sequence content and another for barcode+UMI), in that case we just switch the file name.

Let try and please let me know if any of these ways work, thanks!

Nghia

— Reply to this email directly, view it on GitHubhttps://github.com/eudoraleer/scasa/issues/8#issuecomment-1513338603, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AT4ASOZAX7N5CZVUTL6DJ2TXB2VVBANCNFSM6AAAAAAW55ORHY. You are receiving this because you were mentioned.Message ID: @.***>

nghiavtr commented 1 year ago

Hi,

It is really strange. Can you put the few first lines of R1 and R2 here? And if possible, can you send me the files or a subset of reads from the files, I will try to reproduce the error by running Scasa on the files.

Nghia

RenaeAtkinson commented 1 year ago

Sure thing,

It’s from a publicly available dataset and the sample I am working on is this SRA fastq file: https://www.ncbi.nlm.nih.gov/sra/?term=SRR10340946

Renae

Sent from Mailhttps://go.microsoft.com/fwlink/?LinkId=550986 for Windows

From: Trung Nghia @.> Sent: Wednesday, April 19, 2023 6:18 AM To: @.> Cc: @.>; @.> Subject: Re: [eudoraleer/scasa] So many Error messages: please help (Issue #8)

Hi,

It is really strange. Can you put the few first lines of R1 and R2 here? And if possible, can you send me the files or a subset of reads from the files, I will try to reproduce the error by running Scasa on the files.

Nghia

— Reply to this email directly, view it on GitHubhttps://github.com/eudoraleer/scasa/issues/8#issuecomment-1514487238, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AT4ASO32J5KROSJK2WWMSJ3XB63XDANCNFSM6AAAAAAW55ORHY. You are receiving this because you were mentioned.Message ID: @.***>

nghiavtr commented 1 year ago

hi @RenaeAtkinson ,

Well, I can not reproduce your error, please see the codes I tried below. So it is sure that the issue is not at the input data format.

I guess you might have missed some steps, for example forgetting to add the paths of scasa or salmon alevin( export PATH and export LD_LIBRARY_PATH)

Nghia


##################################################################
# 1. Download scasa:
##################################################################
wget https://github.com/eudoraleer/scasa/releases/download/scasa.v1.0.0/scasa_v1.0.0.tar.gz
tar -xzvf scasa_v1.0.0.tar.gz
export PATH=$PWD/scasa:$PATH

##################################################################
# 2. Download salmon alevin:
##################################################################
wget https://github.com/COMBINE-lab/salmon/releases/download/v1.4.0/salmon-1.4.0_linux_x86_64.tar.gz
tar -xzvf salmon-1.4.0_linux_x86_64.tar.gz
export PATH=$PWD/salmon-latest_linux_x86_64/bin:$PATH
export LD_LIBRARY_PATH=$PWD/salmon-latest_linux_x86_64/lib:$LD_LIBRARY_PATH

##################################################################
# 3. Download UCSC hg38 cDNA fasta reference:
##################################################################
mkdir Annotation
cd Annotation
wget https://www.dropbox.com/s/xoa6yl562a5lv35/refMrna.fa.gz
refPath=$PWD/refMrna.fa.gz

wget https://github.com/10XGenomics/cellranger/blob/master/lib/python/cellranger/barcodes/737K-august-2016.txt
whitelistFile=$PWD/737K-august-2016.txt

cd ..

##################################################################
# 4. Download the CITE-seq RNA samples:
##################################################################

mkdir CiteSeqData
cd CiteSeqData

### use sratools to download the sample
# module load sratools/3.0.0
prefetch SRR10340946
cd SRR10340946
fastq-dump --gzip --split-3 SRR10340946.sra

#change the name
mv SRR10340946_1.fastq.gz SRR10340946_L001_R1_001.fastq.gz
mv SRR10340946_2.fastq.gz SRR10340946_L001_R2_001.fastq.gz

InputDir=$PWD
cd ..

#number of threads
threadNum=$(nproc)

#run scasa
scasa --in $InputDir --fastq SRR10340946_L001_R1_001.fastq.gz,SRR10340946_L001_R2_001.fastq.gz --ref $refPath  --tech 10xv2 --nthreads $threadNum --whitelist $whitelistFile --out ScasaOut_SRR10340946

RenaeAtkinson commented 1 year ago

Hi Nghia,

I was trying to read in a particular file format but the format of the file did not match the format requested in the R function (of a different package). I will have to process the file in R to get it into the format required by the R function for it to be accepted. I got the same error in R that I saw in the command line when running SCASA! Error in file(con, "r") : cannot open the connection

In addition: Warning message:

In file(con, "r") :

cannot open file 'C:\Users\renae\Desktop\Rprojects\SingleCellar-Learn\Human_genesets\Human_genesets\human.signature.genes.v2.gmt': No such file or directory

I am excited because now I know what the error with scasa meant. It meant that the R function was not able to process the file provided into the format that scasa requires and so produced the error. The error occurs at “Begin scasa quantification for sample SRR..”. Does that shed any light on what could be going on and how you could help me?

I would really love it this could work. What you produced in your paper is exactly what I want.

Renae

Sent from Mailhttps://go.microsoft.com/fwlink/?LinkId=550986 for Windows

From: Trung Nghia @.> Sent: Friday, April 21, 2023 2:22 AM To: @.> Cc: @.>; @.> Subject: Re: [eudoraleer/scasa] So many Error messages: please help (Issue #8)

hi @RenaeAtkinsonhttps://github.com/RenaeAtkinson ,

Well, I can not reproduce your error, please see the codes I tried below. So it is sure that the issue is not at the input data format.

I guess you might have missed some steps, for example forgetting to add the paths of scasa or salmon alevin( export PATH and export LD_LIBRARY_PATH)

Nghia

##################################################################

Download scasa:

################################################################## wget https://github.com/eudoraleer/scasa/releases/download/scasa.v1.0.0/scasa_v1.0.0.tar.gz tar -xzvf scasa_v1.0.0.tar.gz export PATH=$PWD/scasa:$PATH

##################################################################

Download salmon alevin:

################################################################## wget https://github.com/COMBINE-lab/salmon/releases/download/v1.4.0/salmon-1.4.0_linux_x86_64.tar.gz tar -xzvf salmon-1.4.0_linux_x86_64.tar.gz export PATH=$PWD/salmon-latest_linux_x86_64/bin:$PATH export LD_LIBRARY_PATH=$PWD/salmon-latest_linux_x86_64/lib:$LD_LIBRARY_PATH

##################################################################

Download UCSC hg38 cDNA fasta reference:

################################################################## mkdir Annotation cd Annotation wget https://www.dropbox.com/s/xoa6yl562a5lv35/refMrna.fa.gz refPath=$PWD/refMrna.fa.gz

wget https://github.com/10XGenomics/cellranger/blob/master/lib/python/cellranger/barcodes/737K-august-2016.txt whitelistFile=$PWD/737K-august-2016.txt

cd ..

##################################################################

Download the CITE-seq RNA samples:

##################################################################

mkdir CiteSeqData cd CiteSeqData

use sratools to download the sample module load sratools/3.0.0

prefetch SRR10340946 cd SRR10340946 fastq-dump --gzip --split-3 SRR10340946.sra

change the name

mv SRR10340946_1.fastq.gz SRR10340946_L001_R1_001.fastq.gz mv SRR10340946_2.fastq.gz SRR10340946_L001_R2_001.fastq.gz

InputDir=$PWD cd ..

number of threads

threadNum=$(nproc)

run scasa

scasa --in $InputDir --fastq SRR10340946_L001_R1_001.fastq.gz,SRR10340946_L001_R2_001.fastq.gz --ref $refPath --tech 10xv2 --nthreads $threadNum --whitelist $whitelistFile --out ScasaOut_SRR10340946

— Reply to this email directly, view it on GitHubhttps://github.com/eudoraleer/scasa/issues/8#issuecomment-1517325823, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AT4ASO657ECPU4VXW76FRX3XCIRR3ANCNFSM6AAAAAAW55ORHY. You are receiving this because you were mentioned.Message ID: @.***>

nghiavtr commented 1 year ago

Hi @RenaeAtkinson,

If you see the message: 'Error in file(con, "r") : cannot open the connection', it is definitely that the program can not find out the file and so it is not the issue of Scasa.

I have tried to run Scasa with your working sample SRR10340946 on my linux computer, it worked well without error. I have provided you the codes previously (but I forgot to put them in the code format, very sorry). So I put the codes again below. I use the sratools to download SRR10340946 data. You just need to copy-and-paste the command lines and it should work.

Nghia

##################################################################
# 1. Download scasa:
##################################################################
wget https://github.com/eudoraleer/scasa/releases/download/scasa.v1.0.0/scasa_v1.0.0.tar.gz
tar -xzvf scasa_v1.0.0.tar.gz
export PATH=$PWD/scasa:$PATH

##################################################################
# 2. Download salmon alevin:
##################################################################
wget https://github.com/COMBINE-lab/salmon/releases/download/v1.4.0/salmon-1.4.0_linux_x86_64.tar.gz
tar -xzvf salmon-1.4.0_linux_x86_64.tar.gz
export PATH=$PWD/salmon-latest_linux_x86_64/bin:$PATH
export LD_LIBRARY_PATH=$PWD/salmon-latest_linux_x86_64/lib:$LD_LIBRARY_PATH

##################################################################
# 3. Download UCSC hg38 cDNA fasta reference:
##################################################################
mkdir Annotation
cd Annotation
wget https://www.dropbox.com/s/xoa6yl562a5lv35/refMrna.fa.gz
refPath=$PWD/refMrna.fa.gz

wget https://github.com/10XGenomics/cellranger/blob/master/lib/python/cellranger/barcodes/737K-august-2016.txt
whitelistFile=$PWD/737K-august-2016.txt

cd ..

##################################################################
# 4. Download the CITE-seq RNA samples:
##################################################################

mkdir CiteSeqData
cd CiteSeqData

### use sratools to download the sample
# module load sratools/3.0.0
prefetch SRR10340946
cd SRR10340946
fastq-dump --gzip --split-3 SRR10340946.sra

#change the name
mv SRR10340946_1.fastq.gz SRR10340946_L001_R1_001.fastq.gz
mv SRR10340946_2.fastq.gz SRR10340946_L001_R2_001.fastq.gz

InputDir=$PWD
cd ..

#number of threads
threadNum=$(nproc)

#run scasa
scasa --in $InputDir --fastq SRR10340946_L001_R1_001.fastq.gz,SRR10340946_L001_R2_001.fastq.gz --ref $refPath  --tech 10xv2 --nthreads $threadNum --whitelist $whitelistFile --out ScasaOut_SRR10340946

RenaeAtkinson commented 1 year ago

Hi Nghia,

So I use salmon in a conda environment on my Linux server. I just updated it to through conda. I have version 1.10.1. I did everything including redownloading the sra file. The first error I get is this: mkdir: cannot create directory ‘ScasaOut_SRR10340946/SCASA_My_Project_20230518140856/’: File exists

Is this a problem with scasa?

The second error I get is Error in file(con, "r") : cannot open the connection Calls: readLines -> file In addition: Warning message: In file(con, "r") : cannot open file 'ScasaOut_SRR10340946/SCASA_My_Project_20230518140856/1ALIGN//SRR10340946_L001_alignout/alevin/bfh.txt': No such file or directory Execution halted

This suggests to me that the program is having a problem making the directory/file bfh.txt.

A similar error emerged later In readChar(con, 5L, useBytes = TRUE) : cannot open compressed file 'ScasaOut_SRR10340946/SCASA_My_Project_20230518140856/2QUANT//SRR10340946_L001_quant//scasa_isoform_expression.RData', probable reason 'No such file or directory' Execution halted

Any ideas what could be the issue? Am I the only person having this problem?

Renae

Sent from Mailhttps://go.microsoft.com/fwlink/?LinkId=550986 for Windows

From: Trung Nghia @.> Sent: Thursday, May 18, 2023 1:32 AM To: @.> Cc: @.>; @.> Subject: Re: [eudoraleer/scasa] So many Error messages: please help (Issue #8)

Hi @RenaeAtkinsonhttps://github.com/RenaeAtkinson,

If you see the message: 'Error in file(con, "r") : cannot open the connection', it is definitely that the program can not find out the file and so it is not the issue of Scasa.

I have tried to run Scasa with your working sample SRR10340946 on my linux computer, it worked well without error. I have provided you the codes previously (but I forgot to put them in the code format, very sorry). So I put the codes again below. I use the sratools to download SRR10340946 data. You just need to copy-and-paste the command lines and it should work.

Nghia

##################################################################

1. Download scasa:

##################################################################

wget https://github.com/eudoraleer/scasa/releases/download/scasa.v1.0.0/scasa_v1.0.0.tar.gz

tar -xzvf scasa_v1.0.0.tar.gz

export PATH=$PWD/scasa:$PATH

##################################################################

2. Download salmon alevin:

##################################################################

wget https://github.com/COMBINE-lab/salmon/releases/download/v1.4.0/salmon-1.4.0_linux_x86_64.tar.gz

tar -xzvf salmon-1.4.0_linux_x86_64.tar.gz

export PATH=$PWD/salmon-latest_linux_x86_64/bin:$PATH

export LD_LIBRARY_PATH=$PWD/salmon-latest_linux_x86_64/lib:$LD_LIBRARY_PATH

##################################################################

3. Download UCSC hg38 cDNA fasta reference:

##################################################################

mkdir Annotation

cd Annotation

wget https://www.dropbox.com/s/xoa6yl562a5lv35/refMrna.fa.gz

refPath=$PWD/refMrna.fa.gz

wget https://github.com/10XGenomics/cellranger/blob/master/lib/python/cellranger/barcodes/737K-august-2016.txt

whitelistFile=$PWD/737K-august-2016.txt

cd ..

##################################################################

4. Download the CITE-seq RNA samples:

##################################################################

mkdir CiteSeqData

cd CiteSeqData

use sratools to download the sample

module load sratools/3.0.0

prefetch SRR10340946

cd SRR10340946

fastq-dump --gzip --split-3 SRR10340946.sra

change the name

mv SRR10340946_1.fastq.gz SRR10340946_L001_R1_001.fastq.gz

mv SRR10340946_2.fastq.gz SRR10340946_L001_R2_001.fastq.gz

InputDir=$PWD

cd ..

number of threads

threadNum=$(nproc)

run scasa

scasa --in $InputDir --fastq SRR10340946_L001_R1_001.fastq.gz,SRR10340946_L001_R2_001.fastq.gz --ref $refPath --tech 10xv2 --nthreads $threadNum --whitelist $whitelistFile --out ScasaOut_SRR10340946

— Reply to this email directly, view it on GitHubhttps://github.com/eudoraleer/scasa/issues/8#issuecomment-1552426341, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AT4ASO5QNFCVLXJUZDDF3WTXGWYABANCNFSM6AAAAAAW55ORHY. You are receiving this because you were mentioned.Message ID: @.***>

nghiavtr commented 1 year ago

Hi @RenaeAtkinson ,

The first error of mkdir can be ignored, it is harmless The second error indicates that the salmon alevin was not performed properly because no bfh.txt file exists, so yes this is the main issue. I have no experience with running Salmon using conda, but usually we dont need conda to run Salmon. I also have not tried salmon version 1.10.1 that I am not sure if it has any changes in setting. I suggest you use the same salmon version as I have tested.

Nghia

eudoraleer / scasa

So many Error messages: please help #8

A newer version of salmon with important bug fixes and improvements is available.

Round 0, 0:2147483648 Pass Filling Filtering 1 36 77 2 5 0 True junctions count = 277411 False junctions count = 422333 Hash table size = 699744 Candidate marks count = 4646414

Reallocating bifurcations time: 0 True marks count: 3337299 Edges construction time: 6

of ones in rank vector: 433948

[2023-04-14 00:15:32.167] [puff::index::jointLog] [info] Starting the Pufferfish indexing by reading the GFA binary file. [2023-04-14 00:15:32.167] [puff::index::jointLog] [info] Setting the index/BinaryGfa directory .//SCASA_testscasaHNVC02_20230414001259/0PRESETS//REF_INDEX size = 98078572

| Loading contigs | Time = 47.228 ms

size = 98078572

| Loading contig boundaries | Time = 25.94 ms

change the name

number of threads

run scasa

1. Download scasa:

2. Download salmon alevin:

3. Download UCSC hg38 cDNA fasta reference:

4. Download the CITE-seq RNA samples:

use sratools to download the sample

module load sratools/3.0.0

change the name

number of threads

run scasa