I have been trying to perform alternative splicing using SUPPA2. Briefly, I have 28 human samples on which RNAseq was performed. From those, I have generated the count files with Kallisto using gencode.v44 indices. Using the same gencode.v44 gtf file, I have generated the different .ioe files for alternate splicing events. Now, my original abundance.tsv file generated by kallisto looked like this:
$ head abundance.tsv
target_id length eff_length est_counts tpm
ENST00000456328.2|ENSG00000290825.1|-|OTTHUMT00000362751.1|DDX11L2-202|DDX11L2|1657|lncRNA| 1657 1658 0 0
ENST00000450305.2|ENSG00000223972.6|OTTHUMG00000000961.2|OTTHUMT00000002844.2|DDX11L1-201|DDX11L1|632|transcribed_unprocessed_pseudogene| 632 633 0 0
ENST00000488147.1|ENSG00000227232.5|OTTHUMG00000000958.1|OTTHUMT00000002839.1|WASH7P-201|WASH7P|1351|unprocessed_pseudogene| 1351 1352 0 0
ENST00000619216.1|ENSG00000278267.1|-|-|MIR6859-1-201|MIR6859-1|68|miRNA| 68 69 0 0
ENST00000473358.1|ENSG00000243485.5|OTTHUMG00000000959.2|OTTHUMT00000002840.1|MIR1302-2HG-202|MIR1302-
Now I processed it to look like this:
$ head abundance_preprocessed.tsv
ENST00000456328.2 0.000000
ENST00000450305.2 0.000000
ENST00000488147.1 0.000000
ENST00000619216.1 0.000000
ENST00000473358.1 0.000000
ENST00000469289.1 0.000000
ENST00000607096.1 0.000000
ENST00000417324.1 0.000000
ENST00000461467.1 0.000000
ENST00000606857.1 0.000000
One of the generated ioe files looks like this:
So now I am trying to generate Psi results using the slurm:
!/bin/bash
SBATCH --job-name=suppa_psi_calculation
SBATCH --output=/home/achanda/%x_%j.out
SBATCH --error=/home/achanda/%x_%j.err
SBATCH --time=04:00:00
SBATCH --cpus-per-task=16
SBATCH --mem=128G
Load Python or any required modules
module load python3
Directories
SUPPA_DIR=~/software/SUPPA-2.3
EVENT_DIR=~/AS_events
PROCESSED_DIR=~/Processed # Correct directory for processed files
OUT_DIR=~/psi_results
Create the output directory
mkdir -p ${OUT_DIR}
Iterate over each event type
for EVENT_FILE in ${EVENT_DIR}/*.ioe; do
EVENT_TYPE=$(basename ${EVENT_FILE} .ioe) # Correct extraction of the event type
# Iterate over each sample's processed file in the Processed directory
for PROCESSED_FILE in ${PROCESSED_DIR}/*_abundance_preprocessed.tsv; do
SAMPLE_NAME=$(basename ${PROCESSED_FILE} _abundance_preprocessed.tsv) # Correct extraction of the sample name
OUTPUT_PREFIX="${OUT_DIR}/${SAMPLE_NAME}_${EVENT_TYPE}"
echo "Starting processing of ${SAMPLE_NAME} for event type ${EVENT_TYPE}..."
python3 ${SUPPA_DIR}/suppa.py psiPerEvent -i ${EVENT_FILE} -e ${PROCESSED_FILE} -o ${OUTPUT_PREFIX}
if [ $? -eq 0 ]; then
echo "Successfully processed ${SAMPLE_NAME} for event type ${EVENT_TYPE}."
else
echo "Failed to process ${SAMPLE_NAME} for event type ${EVENT_TYPE}. Check error logs for details."
fi
done
done
However,
$ head suppa_psi_calculation_28165525.err
INFO:lib.tools:File ~/Sample1_abundance_preprocessed.tsv opened in reading mode.
INFO:psiCalculator:Buffering transcript expression levels.
ERROR:lib.tools:1, in line 2. Skipping line...
ERROR:lib.tools:2, in line 3. Skipping line...
ERROR:lib.tools:3, in line 4. Skipping line...
ERROR:lib.tools:4, in line 5. Skipping line...
ERROR:lib.tools:5, in line 6. Skipping line...
ERROR:lib.tools:6, in line 7. Skipping line...
ERROR:lib.tools:7, in line 8. Skipping line...
ERROR:lib.tools:8, in line 9. Skipping line...
Can someone help with what is going wrong? I can provide further details if needed.
Hi Chandaayan,
your abundance file seems to have many zeroes.
Do you get non-zero abundances for your transcripts? That could trigger some of the errors.
E.
I have been trying to perform alternative splicing using SUPPA2. Briefly, I have 28 human samples on which RNAseq was performed. From those, I have generated the count files with Kallisto using gencode.v44 indices. Using the same gencode.v44 gtf file, I have generated the different .ioe files for alternate splicing events. Now, my original abundance.tsv file generated by kallisto looked like this:
$ head abundance.tsv target_id length eff_length est_counts tpm ENST00000456328.2|ENSG00000290825.1|-|OTTHUMT00000362751.1|DDX11L2-202|DDX11L2|1657|lncRNA| 1657 1658 0 0 ENST00000450305.2|ENSG00000223972.6|OTTHUMG00000000961.2|OTTHUMT00000002844.2|DDX11L1-201|DDX11L1|632|transcribed_unprocessed_pseudogene| 632 633 0 0 ENST00000488147.1|ENSG00000227232.5|OTTHUMG00000000958.1|OTTHUMT00000002839.1|WASH7P-201|WASH7P|1351|unprocessed_pseudogene| 1351 1352 0 0 ENST00000619216.1|ENSG00000278267.1|-|-|MIR6859-1-201|MIR6859-1|68|miRNA| 68 69 0 0 ENST00000473358.1|ENSG00000243485.5|OTTHUMG00000000959.2|OTTHUMT00000002840.1|MIR1302-2HG-202|MIR1302- Now I processed it to look like this:
$ head abundance_preprocessed.tsv ENST00000456328.2 0.000000 ENST00000450305.2 0.000000 ENST00000488147.1 0.000000 ENST00000619216.1 0.000000 ENST00000473358.1 0.000000 ENST00000469289.1 0.000000 ENST00000607096.1 0.000000 ENST00000417324.1 0.000000 ENST00000461467.1 0.000000 ENST00000606857.1 0.000000 One of the generated ioe files looks like this:
$ head gencode.v44_A3_strict.ioe seqname gene_id event_id alternative_transcripts total_transcripts chr1 ENSG00000237491.10 ENSG00000237491.10;A3:chr1:779092-803919:779092-803951:+ ENST00000655765.1,ENST00000670700.1,ENST00000657896.1,ENST00000658648.1,ENST00000656571.1,ENST00000669749.1,ENST00000666217.1,ENST00000665719.1 ENST00000665719.1,ENST00000412115.2,ENST00000666217.1,ENST00000658648.1,ENST00000669749.1,ENST00000657896.1,ENST00000670700.1,ENST00000655765.1,ENST00000656571.1 chr1 ENSG00000237491.10 ENSG00000237491.10;A3:chr1:779092-803919:779092-803922:+ ENST00000655765.1,ENST00000670700.1,ENST00000657896.1,ENST00000658648.1,ENST00000656571.1,ENST00000669749.1,ENST00000666217.1,ENST00000665719.1 ENST00000665719.1,ENST00000443772.2,ENST00000666217.1,ENST00000658648.1,ENST00000669749.1,ENST00000657896.1,ENST00000670700.1,ENST00000655765.1,ENST00000656571.1 chr1 ENSG00000237491.10 ENSG00000237491.10;A3:chr1:807323-809622:807323-809658:+ ENST00000412115.2,ENST00000670700.1,ENST00000657896.1,ENST00000658648.1 ENST00000665719.1,ENST00000658648.1,ENST00000588951.5,ENST00000669749.1,ENST00000657896.1,ENST00000670700.1,ENST00000412115.2,ENST00000656571.1
So now I am trying to generate Psi results using the slurm:
!/bin/bash
SBATCH --job-name=suppa_psi_calculation
SBATCH --output=/home/achanda/%x_%j.out
SBATCH --error=/home/achanda/%x_%j.err
SBATCH --time=04:00:00
SBATCH --cpus-per-task=16
SBATCH --mem=128G
Load Python or any required modules
module load python3
Directories
SUPPA_DIR=~/software/SUPPA-2.3 EVENT_DIR=~/AS_events PROCESSED_DIR=~/Processed # Correct directory for processed files OUT_DIR=~/psi_results
Create the output directory
mkdir -p ${OUT_DIR}
Iterate over each event type
for EVENT_FILE in ${EVENT_DIR}/*.ioe; do EVENT_TYPE=$(basename ${EVENT_FILE} .ioe) # Correct extraction of the event type
done
However,
$ head suppa_psi_calculation_28165525.err INFO:lib.tools:File ~/Sample1_abundance_preprocessed.tsv opened in reading mode. INFO:psiCalculator:Buffering transcript expression levels. ERROR:lib.tools:1, in line 2. Skipping line... ERROR:lib.tools:2, in line 3. Skipping line... ERROR:lib.tools:3, in line 4. Skipping line... ERROR:lib.tools:4, in line 5. Skipping line... ERROR:lib.tools:5, in line 6. Skipping line... ERROR:lib.tools:6, in line 7. Skipping line... ERROR:lib.tools:7, in line 8. Skipping line... ERROR:lib.tools:8, in line 9. Skipping line...
Can someone help with what is going wrong? I can provide further details if needed.