Xinglab / rmats-turbo

Other
216 stars 52 forks source link

Issue with Opening BAM Files in rMATS on WSL Environment #351

Open Camwalker9900 opened 8 months ago

Camwalker9900 commented 8 months ago

Hello Biostars Community/rMATS Developers,

I am encountering a persistent issue when trying to run rMATS (RNA-Seq-MATS) on a Windows Subsystem for Linux (WSL) environment. Despite following the standard setup and execution procedures, rMATS consistently fails to open the specified BAM files for analysis. I am seeking assistance in troubleshooting and resolving this issue.

Environment Details:

Operating System: Windows 10 with WSL Linux Distribution in WSL: Ubuntu-20.04 rMATS Version: [Specify the version of rMATS you are using] Python Version: [Specify the version of Python used] Problem Description: When executing the rMATS script with properly formatted input files, it systematically fails to open the BAM files, and the process ends with a Python ValueError. The issue occurs despite confirming that the BAM files are accessible and readable within the WSL environment.

Error Messages: The primary error messages received are as follows: Fail to open /mnt/c/Users/u253262/Documents/Output/sample_x_Aligned.sortedByCoord.out.bam [...similar messages for other BAM files...] ValueError: invalid literal for int() with base 10: '/mnt/c/Users/u253262/Documents/Output/sample_x_Aligned.sortedByCoord.out.bam'

Steps Taken:

Confirmed that the BAM files are accessible in WSL by running Python scripts and basic Linux commands (ls, head). Ensured that file paths are correctly formatted for WSL, using absolute paths. Attempted running rMATS with different subsets of BAM files. [Any other troubleshooting steps you've taken] I am looking for insights into what might be causing this issue and how to resolve it. Is this a known problem with rMATS in a WSL environment, or could it be related to how file paths are handled in the script? Any advice or suggestions would be greatly appreciated.

Thank you for your time and assistance.

EricKutschera commented 8 months ago

https://www.biostars.org/p/9583423/

EricKutschera commented 8 months ago

See this post: https://github.com/Xinglab/rmats-turbo/issues/322 The issue in that case was using newlines instead of commas to separate the bams in --b1

Camwalker9900 commented 8 months ago

I am reaching out again regarding an issue with my recent rMATS run, where the output indicates that no reads are being used ('USED: 0'), despite seemingly correct inputs. Here's a snippet of my command output: (base) camwalker@SUSANH-DT-01:/mnt/c/Users/u253262/downloads/rmats_turbo_v4_2_0$ python3 /mnt/c/Users/u253262/downloads/rmats_turbo_v4_2_0/rmats.py \

--b1 /mnt/c/Users/u253262/downloads/rmats_turbo_v4_2_0/group5.txt \ --b2 /mnt/c/Users/u253262/downloads/rmats_turbo_v4_2_0/group7.txt \ --gtf /mnt/c/Users/u253262/Documents/M33_ref/gencode.vM33.chr_patch_hapl_scaff.annotation.gtf \ --od /mnt/c/Users/u253262/downloads/rmats_turbo_v4_2_0/output \ --tmp /mnt/c/Users/u253262/downloads/rmats_turbo_v4_2_0/tmp \ --readLength 149 \ --nthread 6 \ --allow-clipping gtf: 8.348881959915161 There are 56941 distinct gene ID in the gtf file There are 149547 distinct transcript ID in the gtf file There are 35471 one-transcript genes in the gtf file There are 869452 exons in the gtf file There are 28169 one-exon transcripts in the gtf file There are 22802 one-transcript genes with only one exon in the transcript Average number of transcripts per gene is 2.626350 Average number of exons per transcript is 5.813905 Average number of exons per transcript excluding one-exon tx is 6.931100 Average number of gene per geneGroup is 7.532353 statistic: 0.060060739517211914

read outcome totals across all BAMs USED: 0 NOT_PAIRED: 0 NOT_NH_1: 200852138 NOT_EXPECTED_CIGAR: 7548058 NOT_EXPECTED_READ_LENGTH: 466124400 NOT_EXPECTED_STRAND: 0 EXON_NOT_MATCHED_TO_ANNOTATION: 0 JUNCTION_NOT_MATCHED_TO_ANNOTATION: 0 CLIPPED: 0 total: 674524596 outcomes by BAM written to: /mnt/c/Users/u253262/downloads/rmats_turbo_v4_2_0/tmp/2023-12-27-15_35_12_157915_read_outcomes_by_bam.txt

novel: 3583.2080335617065 The splicing graph and candidate read have been saved into /mnt/c/Users/u253262/downloads/rmats_turbo_v4_2_0/tmp/2023-12-27-15_35_12157915*.rmats save: 0.01957249641418457 loadsg: 0.02886652946472168

========== Done processing each gene from dictionary to compile AS events Found 21064 exon skipping events Found 993 exon MX events Found 9346 alt SS events There are 5893 alt 3 SS events and 3453 alt 5 SS events. Found 4520 RI events

ase: 1.1658132076263428 count: 0.15857434272766113 Processing count files. Done processing count files.

Subject: Assistance Requested: rMATS Output Interpretation - Zero Reads Used

Dear rMATS Development Team,

I am reaching out again regarding an issue with my recent rMATS run, where the output indicates that no reads are being used ('USED: 0'), despite seemingly correct inputs. Here's a snippet of my command output:

vbnet Copy code gtf: 8.34... There are 56941 distinct gene ID... ... read outcome totals across all BAMs USED: 0 NOT_PAIRED: 0 NOT_NH_1: 200852138 ... Total: 674524596 I am puzzled as to why rMATS is not utilizing any of my reads, and I'm concerned about the potential impact on the analysis accuracy. Could you provide insights into possible reasons for this outcome and suggest any corrective actions? Any details on how rMATS processes and categorizes reads would be extremely helpful.

EricKutschera commented 8 months ago

From the output, about 70% of the reads were not used because they didn't have the expected read length:

NOT_EXPECTED_READ_LENGTH: 466124400
total: 674524596

If the reads are the same length you can change --readLength 149 to the actual read length. If the reads are of different lengths then you can just add --variable-read-length