comprna / RATTLE

Reference-free reconstruction and error correction of transcriptomes from Nanopore long-read sequencing
GNU General Public License v3.0
57 stars 10 forks source link

Segmentation fault (core dumped) #26

Open BJ-Chen-Eric opened 3 years ago

BJ-Chen-Eric commented 3 years ago

Hi, thanks for developing the tool. As title, when running the rattle cluster it return Segmentation fault (core dumped). This is the code.

rattle/rattle cluster -i ~/Analysis/data/process/rna1/rna1.filter.fastq.gz -o ~/Analysis/tool/isoform_detection/rat/ --iso --rna

and the output is

RNA mode: 1 Reading fasta file... Done Segmentation fault (core dumped)

The input file is less than 500 thousands reads and the device is 16 cores/32 threads with 1T memory. From previous discussion, the limited memory might the problem but I think my input reads has much lower amount. Hope anyone could discuss about it.

novikk commented 3 years ago

Hi, from what I see from the command you are trying to use a compressed fastq file, which RATTLE doesn't support as of now. You will need to uncompress it first.

Also, be sure to filter out small reads (we usually filter out those smaller than 150bp).

Best, Ivan

BJ-Chen-Eric commented 3 years ago

Thanks for your rapid response. I tried it with an uncompressed and filtered file again but the result is the same. Another update is that when I adopt the split.fastq file as rattle input it can process but there is nothing in the output. Should I offer the fastq file to help figure out the problem? unnamed image

Best wishes

novikk commented 3 years ago

Hi! Yes, please send me the fastq file if that's possible to ivan.delarubia@upf.edu

ziweiwuzw commented 1 year ago

I encounter the same issuses. image

ziweiwuzw commented 1 year ago

I have CPU 128 with 1T memory, but I still can not run a command. Could you help me?

eileen-xue commented 1 year ago

Hi there,

Do you run into any problems when using RATTLE with the example toyset dataset? Can you please check whether your reads contain any invalid bases? RATTLE could run into this issue when generating kmers with reads containing invalid bases.

Hope this helps, Eileen

ziweiwuzw commented 1 year ago

I did not encounter any issues while using RATTLE with the example toyset dataset. I utilized fastp to filter out low-quality bases. However, both the original fastq file (20G) and the trim.fastq file (12G) faced the same problems when using RATTLE. What does 'invalid bases' refer to? Does it mean base 'N'? image

ziweiwuzw commented 1 year ago

Hi there,

Do you run into any problems when using RATTLE with the example toyset dataset? Can you please check whether your reads contain any invalid bases? RATTLE could run into this issue when generating kmers with reads containing invalid bases.

Hope this helps, Eileen

I encountered no issues while utilizing RATTLE with the toyset dataset as an example. To eliminate low-quality bases, I employed fastp. Nevertheless, both the initial fastq file (20G) and the trim.fastq file (12G) encountered identical problems during the application of RATTLE. Additionally, upon counting the bases in my fastq file, I did not observed the presence of the "N" base. Could you help me check this issue? :)

eileen-xue commented 1 year ago

Hi,

Valid bases are A,T,C,G,U. All other bases in reads are considered as invalid, including 'N'. No need to worry about 'N', RATTLE will filter it out.

Could you please provide your RATTLE command? If it is possible, can you please run RATTLE with your dataset with '--verbose' flag and provide the progress bar? This could provide me more information and identify why and where RATTLE went wrong.

Thanks, Eileen

EduEyras commented 1 year ago

Do you have very short or extremely long reads in your input? E.

On Wed, 7 Jun 2023 at 12:51, Wu Ziwei @.***> wrote:

Hi there,

Do you run into any problems when using RATTLE with the example toyset dataset? Can you please check whether your reads contain any invalid bases? RATTLE could run into this issue when generating kmers with reads containing invalid bases.

Hope this helps, Eileen

I encountered no issues while utilizing RATTLE with the toyset dataset as an example. To eliminate low-quality bases, I employed fastp. Nevertheless, both the initial fastq file (20G) and the trim.fastq file (12G) encountered identical problems during the application of RATTLE. Additionally, upon counting the bases in my fastq file, I did not observed the presence of the "N" base. Could you help me check this issue? :)

— Reply to this email directly, view it on GitHub https://github.com/comprna/RATTLE/issues/26#issuecomment-1579787831, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKBYQE3BREWONVKUBFRDXJ7UBTANCNFSM46HK3SQQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

ziweiwuzw commented 1 year ago

Dear Eileen, Thank you for your assistance! I have retried using fastp to filter the data. Fortunately, I successfully obtained the desired result when I run the same commands. However, when attempting to process the original fastq file with and applied the command you suggested, I encountered difficulties and couldn't understand the cause. It is possible that my original fastq file contains duplicate bases and low-quality bases, which could be the reason for the issues. The following figures is my command and error result. image image

ziweiwuzw commented 1 year ago

Do you have very short or extremely long reads in your input? E. On Wed, 7 Jun 2023 at 12:51, Wu Ziwei @.> wrote: Hi there, Do you run into any problems when using RATTLE with the example toyset dataset? Can you please check whether your reads contain any invalid bases? RATTLE could run into this issue when generating kmers with reads containing invalid bases. Hope this helps, Eileen I encountered no issues while utilizing RATTLE with the toyset dataset as an example. To eliminate low-quality bases, I employed fastp. Nevertheless, both the initial fastq file (20G) and the trim.fastq file (12G) encountered identical problems during the application of RATTLE. Additionally, upon counting the bases in my fastq file, I did not observed the presence of the "N" base. Could you help me check this issue? :) — Reply to this email directly, view it on GitHub <#26 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKBYQE3BREWONVKUBFRDXJ7UBTANCNFSM46HK3SQQ . You are receiving this because you are subscribed to this thread.Message ID: @.>

Yes, you might be correct. I checked my fastq file and confirmed that it contains reads longer than 150. However, I neglected to determine the length of the longest reads.

improudofmyself commented 5 months ago

Facing this issue:========================================== SLURM_JOB_ID = 2797830 SLURM_NODELIST = hm02

Starting at Tue May 21 18:49:46 CDT 2024 Job name: Rattle, Job ID: 2797830 I have 4 CPUs on compute node hm02 RNA mode: true Reading fasta file... Reads: 10527128 Done /var/spool/slurmd/job2797830/slurm_script: line 82: 2113221 Killed ./rattle cluster -i "$filtered_file" -o "$output_folder" --rna -B 0.5 -b 0.3 -f 0.2 Reading fasta file... Done Reading fasta file... Done Using bigmem partition of our cluster: $ slurminfo QUEUE FREE TOTAL FREE TOTAL RESORC OTHER MAXJOBTIME CORES NODE GPU
PARTITION CORES CORES NODES NODES PENDING PENDING DAY-HR:MN /NODE MEM-GB (COUNT)

  bigmem     48     96      0      2        0        0     7-00:00       48    1500 -         
    My slurm looks like this:
improudofmyself commented 5 months ago

echo "Starting at $(date)"

echo "Job name: ${SLURM_JOB_NAME}, Job ID: ${SLURM_JOB_ID}"

echo "I have ${SLURM_CPUS_ON_NODE} CPUs on compute node $(hostname -s)"

Navigate to Porechop directory

cd /home/rkumar/Porechop || exit

Define input and output paths

input_file="/scratch/g/........./Nanopore_cDNA/A3HE/A3HE.fastq"

output_folder="/scratch/g/...../Nanopore_cDNA/A3HE/Rattle_A3HE"

Check if input file exists

if [ ! -f "$input_file" ]; then

echo "Error: Input file not found!"

exit 1

fi

Create output directory if it does not exist

mkdir -p "$output_folder/clusters"

Step 1: Filter reads by length (if needed, adjust according to your data)

filtered_file="${input_file%.fastq}_filtered.fastq"

porechop -i "$input_file" -o "$filtered_file" --discard_middle --min_split_read_size 150

Check if filtered file was created

if [ ! -f "$filtered_file" ]; then

echo "Error: Filtered file not created!"

exit 1

fi

Navigate to Rattle directory

cd /home/rkumar/RATTLE || exit

Step 2: Run the RATTLE commands

./rattle cluster -i "$filtered_file" -o "$output_folder" --rna -B 0.5 -b 0.3 -f 0.2

./rattle cluster_summary -i "$filtered_file" -c "$output_folder/clusters.out" > "$output_folder/cluster_summary.tsv"

./rattle extract_clusters -i "$filtered_file" -c "$output_folder/clusters.out" -o "$output_folder/clusters" --fastq

Step 3: Correct reads

./rattle correct -i "$filtered_file" -c "$output_folder/clusters.out" -o "$output_folder"

Step 4: Merge consensi files and run polishing step

consensi_file="$output_folder/consensi.fq"

cat "$output_folder"/*/consensi.fq > "$consensi_file"

Check if consensi file was created

if [ ! -f "$consensi_file" ]; then

echo "Error: Consensi file not created!"

exit 1

fi

./rattle polish -i "$consensi_file" -o "$output_folder" --rna

echo "Finished at $(date)"

Periodically log memory usage

while true; do

echo "Memory usage at $(date):"

free -h

sleep 600  # Log every 10 minutes

done &