Open GriffinBINF opened 10 months ago
Update: I have been able to successfully run STAR with some STARsolo commands to align the genome and it does create the solo.out folder, along with the barcodes.tsv file and a mostly empty matrix file. This leads me to believe that the segfault is not being caused by write permissions or a non-existent output file for the counts.
Here is my current command:
STAR --genomeDir ${GENOME_DIR} \ --runThreadN 64 \ --readFilesIn ${RAW_DATA_DIR}/${ACCESSION}/${ACCESSION}_2.fastq ${RAW_DATA_DIR}/${ACCESSION}/${ACCESSION}_1.fastq\ --outFileNamePrefix ${OUTPUT_DIR}/${ACCESSION}/ \ --outSAMtype BAM SortedByCoordinate \ --outSAMunmapped Within \ --outSAMattributes Standard \ --soloType CB_UMI_Simple \ --soloCBwhitelist whitelist.txt
This is somewhat helpful to my workflow because I can at least run velocyto manually on the bam and barcode outputs. The major barrier now is that I have been so far unable to run cellranger count directly on the files because for whatever reason cellranger is not accepting my gtf transcriptome file. This particular error is outside the scope of this help request, but I think context around my overall workflow could be helpful.
Another quick update, I ran using human data and it worked fine with all of the expected soloFeatures outputs
Hi @GriffinBINF
This looks like a bug with Velocyto calculation for a non-trivial genome... Could you please send me the Log.out file for the failed run?
Hi Alex,
Thank you so much for looking into this for me. Here is the log file for the most recent run: Log.out.txt
I am investigating other potential causes like the architecture of the cluster I am using since some collaborators were able to obtain the counts using their own server.
Also, here is the sole error message from the .err file: line 8: 243470 Segmentation fault "${cmd}" "$@"
Please let me know if you can determine anything.
Best, Griffin
Hi Griffin,
I did not seen anything suspicious in the Log.out.txt file. If the same job was run successfully on a different server, it may indeed be a problem with the cluster.
Cheers Alex
Hi Alex,
It's unfortunate that it was not something more obvious. If it isnt too much trouble, do you know some good ways I could continue to troubleshoot and pin down where the issue is occurring with the cluster?
Additionally I was reading the documentation and it suggested reaching out to you/the team for jobs involving very large or small genomes. Do you have any parameter suggestions for the Axolotl genome that would differ from the default settings?
Thank you very much for your help.
Cheers, Griffin
Hi Griffin,
The genome index looks fine.
I would recommend removing some parameters to see where the problem comes from.
I would start with --twopassMode Basic
.
Issue Description: I am experiencing a segmentation fault when using STARsolo (version 2.7.11a) to process large single-cell RNA sequencing datasets from axolotl leukocytes. The datasets are paired 10x Genomics FASTQ files from the SRA with accession IDs SRR10445716 to SRR10445723.
System and Resource Allocation:
Input Data:
Error Details:
Troubleshooting Attempts:
Request for Assistance: I am seeking advice on resolving this segmentation fault. Are there any known issues with STARsolo handling large datasets, or could there be specific parameter adjustments that might mitigate this error? Any insights or suggestions for workarounds would be greatly appreciated. As an alternative, I'm considering aligning with STAR and then using Velocyto or another tool for gene and splicing counts, but I am open to recommendations.