WeichenZhou / PALMER

Pre-mAsking Long reads for Mobile Element inseRtion
MIT License
12 stars 5 forks source link

terminate called after throwing an instance of 'std::bad_alloc' #21

Closed mmisak closed 2 years ago

mmisak commented 2 years ago

Hello, I've been trying to run PALMER on my data and it seems to work fine mostly, creating a big number of "chr" folders. However, at some point it throws the following error: terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc /fsimb/cluster/software/slurm_debian9/var/spool/slurmd/job429477/slurm_script: line 41: 35452 Aborted $PALMER_BINARY --input $bam --workdir ${WORKING_FOLDER}/ --ref_ver other --output results --ref_fa $REF_GENOME --type CUSTOMIZED --custom_seq $cons --custom_index $PALMER_TE_INDEX

I think, most of the variables in the command I'm using to run PALMER in my script are self-explanatory and $cons is the consensus sequence of my transposable elements of interest (several LINE1 families).

From what I've read, the error I'm getting appears to be due to low memory, but I'm already running PALMER with 20GB of RAM (on the mouse genome).

Do you have any suggestions how to solve this issue?

Edit: This also happens when running with 200GB of RAM.

WeichenZhou commented 2 years ago

Hi @mmisak

Sorry for the delayed reply. I don't think it's the reason for the lack of RAM (since you have 200GB already). Could you please show me the running information PALMER output for where it terminated? and the files in the last folder it created?

Thank you!

mmisak commented 2 years ago

Hello, I uploaded the last folder it created to my university's filesharing service: https://seafile.rlp.net/d/c1999f720e40444ba430/ Can you please let me know when you downloaded the data, such that I can delete it again?

I posted the output of stderr in my first post above. The last Stdout output is this:

1. Samtools Step for region chr14_76000001_77000000 now completed.
Pre-masking step for chr14_76000001_77000000 completed.
Blastn Step for region chr14_76000001_77000000 completed.
Single read calling step for chr14_76000001_77000000 completed.
WeichenZhou commented 2 years ago

Hi @mmisak I have downloaded the folder. I will look at it these days and let you know when it's done.

Thank you for sharing the data. Best

WeichenZhou commented 2 years ago

In the meantime, could you mind using ncbi-blast++/2.10.0 to test it again?

WeichenZhou commented 2 years ago

Could you mind sharing a sample of bam that has the error or at least give me the header of the bam for me to create one using the sam you've already sent so that I can reproduce the error? (the whole bam would be preferable)

Thank you!

mmisak commented 2 years ago

I originally ran it with blast 2.12.0, that should work I guess? I couldn't find 2.10.0 on conda, which I am using for my environment.

I uploaded the bam here: https://seafile.rlp.net/f/bb31f44f1d9541f798e7/

Can you please again let me know when you downloaded it, such that I can delete it?

WeichenZhou commented 2 years ago

Hi @mmisak

You could try to visit https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.10.0/ for this version. I have downloaded your data now and will get back to you later.

Best,

WeichenZhou commented 2 years ago

Hi @mmisak Could you provide the --custom_seq and --custom_index you had in the error run? thanks a lot!

Best

mmisak commented 2 years ago

Hello, sorry, I didn't have time to re-try with BLAST 2.10.0 so far.

Here are the custom_seq and custom_index I used: https://seafile.rlp.net/d/c0f120daffe448698435/

WeichenZhou commented 2 years ago

Hi @mmisak Thank you so much for the files.... I am running PALMER on your data now. Will get back to you soon.

Best

mmisak commented 2 years ago

Thank you for looking into this.

WeichenZhou commented 2 years ago

Hi @mmisak

I was able to locate the error due to the sequence content at a certain read. However, it might cost a little more time to troubleshoot this issue completely.

In the meantime, if you can't wait to have the results, you could add parameters --start and --end to detour and avoid the blocks that would cause the error. For example, in chr14 at chr14_76000001_77000000, PALMER will throw out the error as well as the blocks chr14_88000001_89000000 and chr14_113000001_114000000. So you can run the rest of the regions separately by running your command lines with additional parameters --chr chr14 --start 1 --end 76000000, --chr chr14 --start 77000001 --end 88000000, and --chr chr14 --start 89000001 --end 125000000 (125000000 is the approximate end of the chr14). The drawback here is that it could be a bit laborious every time you encounter a similar error further you would add one more command line and skip one block and it would not discover the elements of interest in the blocks that are skipped.

I will keep working on this issue and hopefully, I could get an upgrade to patch this up as soon as possible.

Thank you so much!!

mmisak commented 2 years ago

Hello, I'm glad you could reproduce the error. For now, I am working on a different type of analysis, so there is no hurry. But I'd be glad to try PALMER with my data as soon as you manage to fix the error.

Thanks again for investigating the issue.

WeichenZhou commented 2 years ago

Hi @mmisak Please check the new version of PALMER.

Best

WeichenZhou commented 2 years ago

I now close this issue.