AlgoLab / galig

A graph aligner
GNU General Public License v3.0
26 stars 12 forks source link

FAILED dependencies of target libtbb; tbb-2018_U3.tgz #5

Closed amitfenn closed 4 years ago

amitfenn commented 4 years ago

Greetings, maintainers,

I need your tool on a docker container. However, I haven't been able to install the tool. Could you help me out?

Commands run:

$docker run -it --rm ubuntu:latest root@\:/docker_main# cat /etc/issue Ubuntu 18.04.4 LTS \n \l

root@\/docker_main# apt-get update && apt-get install build-essential git python3 python3-pip python3-setuptools python3-biopython python3-biopython-sql python3-pysam cmake libboost1.65-all-dev samtools unzip wget curl zlib1g-dev liblzma-dev libjemalloc-dev libjemalloc1 libghc-bzlib-dev libgff-dev libtbb-dev

root@\/docker_main# pip3 install gffutils; git clone --recursive https://github.com/AlgoLab/galig.git ; cd galig; make prerequisites

ERROR message

. .. ... [ 23%] Completed 'libstadenio' [ 23%] Built target libstadenio Scanning dependencies of target libtbb [ 24%] Creating directories for 'libtbb' [ 25%] Performing download step for 'libtbb' % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 125 100 125 0 0 679 0 --:--:-- --:--:-- --:--:-- 679 100 126 100 126 0 0 345 0 --:--:-- --:--:-- --:--:-- 345 100 2843k 0 2843k 0 0 2008k 0 --:--:-- 0:00:01 --:--:-- 5407k tbb-2018_U3.tgz: FAILED sha256sum: WARNING: 1 computed checksum did NOT match tbb-2018_U3.tgz did not match expected SHA256! Exiting. CMakeFiles/libtbb.dir/build.make:89: recipe for target 'libtbb-prefix/src/libtbb-stamp/libtbb-download' failed make[4]: [libtbb-prefix/src/libtbb-stamp/libtbb-download] Error 1 CMakeFiles/Makefile2:178: recipe for target 'CMakeFiles/libtbb.dir/all' failed make[3]: [CMakeFiles/libtbb.dir/all] Error 2 Makefile:162: recipe for target 'all' failed make[2]: [all] Error 2 [ 8%] Built target libcereal [ 15%] Built target libdivsufsort [ 23%] Built target libstadenio [ 24%] Performing download step for 'libtbb' % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 125 100 125 0 0 1893 0 --:--:-- --:--:-- --:--:-- 1893 100 126 100 126 0 0 1482 0 --:--:-- --:--:-- --:--:-- 1482 100 2843k 0 2843k 0 0 2660k 0 --:--:-- 0:00:01 --:--:-- 3650k tbb-2018_U3.tgz: FAILED sha256sum: WARNING: 1 computed checksum did NOT match tbb-2018_U3.tgz did not match expected SHA256! Exiting. CMakeFiles/libtbb.dir/build.make:89: recipe for target 'libtbb-prefix/src/libtbb-stamp/libtbb-download' failed make[4]: [libtbb-prefix/src/libtbb-stamp/libtbb-download] Error 1 CMakeFiles/Makefile2:178: recipe for target 'CMakeFiles/libtbb.dir/all' failed make[3]: [CMakeFiles/libtbb.dir/all] Error 2 Makefile:162: recipe for target 'all' failed make[2]: [all] Error 2 /galig/Makefile:126: recipe for target '/galig/salmon/bin/salmon' failed make[1]: [/galig/salmon/bin/salmon] Error 2 target.mk:16: recipe for target '/galig/obj' failed make: [/galig/obj] Error 2


Any help would be much appreciated, Thanking you, Amit

ldenti commented 4 years ago

Hi, the error occurs while compiling tbb (a dependency of salmon): it seems that the developers of tbb changed something in their release.

I found a possible workaround (but not a fix) to the problem.

Open the CMakeLists.txt file from the cloned salmon repository and edit the following lines:

This is a patch you can apply: CMakeLists.txt.patch.txt.

Let me know if it worked.

Best, Luca

amitfenn commented 4 years ago

Okay, So I got the salmon patch in, but is there a way to get asgal to stop downloading and installing salmon again? Because I get the same error message.

ldenti commented 4 years ago

So you applied the patch, ran make prerequisites again and it failed with the same message? It's quite strange, I tried now and it worked

This is what I did (inside the docker container):

# install dependencies with apt and pip3
git clone --recursive https://github.com/AlgoLab/galig.git
cd galig/salmon
wget https://github.com/AlgoLab/galig/files/4437983/CMakeLists.txt.patch.txt
git apply CMakeLists.txt.patch.txt
cd ..
make salmon # you can compile just salmon running this
# Then you have to compile lemon and sdsl
# make lemon sdsl

It shouldn't be necessary but, if you already cloned the repository on your container and tried to compile salmon, you can try to remove the folders salmon/external and salmon/build.

ldenti commented 4 years ago

Anyway, since the problem you are encountering affects the compilation of asgal on any system (not only in a docker container), I removed salmon as submodule and I modified the Makefile to download directly salmon pre-compiled binary.

You can find the changes in the binsalmon branch. Can you try it out and let me know if it works?

After you installed all the dependencies, you just have to:

git clone --recursive https://github.com/AlgoLab/galig.git
git checkout binsalmon
make prerequisites
make
amitfenn commented 4 years ago

Thank you Luca,

I had earlier downloaded the entire salmon release v0.12.0, rather than just using the patch you provided: https://github.com/AlgoLab/galig/files/4437983/CMakeLists.txt.patch.txt. Your commands were handy too.

I had traced the CMakeLists.txt that you shared with me and I thought I should have gotten that version of Salmon.

Either way.. I also am super grateful for the way you fix errors.. The patch, the bin and modifying your repo. I thank you for your quick response, for fixing this bug for others as well and for your patience.

amitfenn commented 4 years ago

Dear Luca,

I thought this was a closed issue, but apparently not. I was trying ASGAL out with the --multi function and I think the error still has to do with ASGAL's interface with Salmon. Perhaps you'd be a better judge of what's exactly going on here.

(base) root@c08985285cee:/docker_main# asgal --multi -g ./Homo_sapiens.GRCh38.dna.primary_assembly.fa -a ./splicing_variants.gtf -s ./test_1.fastq -s2 ./test_2.fastq -t ./splicing_variants_transcripts.fa -o ./asgalresults [ Apr 30, 2020 - 9:16:15PM ] Opening input annotation... [ Apr 30, 2020 - 9:16:15PM ] Indexing... [ Apr 30, 2020 - 9:18:01PM ] Splitting input annotation... [##################################################] 37648/37648 [ Apr 30, 2020 - 9:19:08PM ] Done. [ Apr 30, 2020 - 9:19:08PM ] Splitting input reference... [ Apr 30, 2020 - 9:20:12PM ] Done. [ Apr 30, 2020 - 9:20:12PM ] Running Salmon indexing... Traceback (most recent call last): File "/opt/galig/asgal", line 536, in main() File "/opt/galig/asgal", line 530, in main runSalmon(args) File "/opt/galig/asgal", line 169, in runSalmon stdout=open(salmonIndexLog, 'w'), stderr=open(salmonIndexLog, 'w')) File "/usr/lib/python3.6/subprocess.py", line 423, in run with Popen(*popenargs, **kwargs) as process: File "/usr/lib/python3.6/subprocess.py", line 729, in init restore_signals, start_new_session) File "/usr/lib/python3.6/subprocess.py", line 1364, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) PermissionError: [Errno 13] Permission denied: 'salmon' (base) root@c08985285cee:/docker_main#

Note on docker: Dockers usually have permission issues, so these are now copied files inside the docker container and the container is running on root. So I don't expect the permission issues to be really coming from docker.

Any help would be much appreciated, Thanks, Amit

ldenti commented 4 years ago

Hi Amit, did asgal work on the example data?

Can you please send me the salmon log you find in the logs subfolder (inside the output folder)?

Luca

amitfenn commented 4 years ago

❯ l total 5,5G drwxr-sr-x 6 afenn asdockers 4,0K Mai 11 09:05 . drwxrws--- 14 tim asdockers 4,0K Mai 11 08:47 .. drwxr-sr-x 2 afenn asdockers 1,5M Mai 11 09:02 annos -rw-rw-r-- 1 afenn asdockers 1,2G Apr 30 22:50 Homo_sapiens.GRCh38.98.chr.gtf -rw-rw-r-- 1 afenn asdockers 3,0G Apr 30 22:51 Homo_sapiens.GRCh38.dna.primary_assembly.fa -rw-rw-r-- 1 afenn asdockers 354M Apr 30 22:51 Homo_sapiens.GRCh38.transcriptome.fa drwxr-sr-x 2 afenn asdockers 4,0K Mai 11 09:04 logs drwxr-sr-x 2 afenn asdockers 12K Mai 11 09:04 refs drwxr-sr-x 4 afenn asdockers 4,0K Mai 11 09:04 salmon -rw-r--r-- 1 afenn asdockers 146M Apr 30 22:51 splicing_variants.gtf -rw-r--r-- 1 afenn asdockers 512M Apr 30 22:57 splicing_variants.gtf.db -rw-rw-r-- 1 afenn asdockers 340M Apr 30 22:51 splicing_variants_transcripts.fa

❯ ls logs salmon -R logs: salmon_index.log

salmon: salmon_index salmon_out

salmon/salmon_index: ❯ cat logs/salmon_index.log

salmon/salmon_out:

Sorry Luca, I double checked this. The logs appear to be empty. Is there a verbose or a debug mode I could try for perhaps more information?

I'd say asgal does not work for my example dataset. It does work without the " --multi " function.

ldenti commented 4 years ago

Can you please run salmon from terminal and let me know what is its output? This is the command run by the asgal script:

{galig_repo}/salmon/bin/salmon index -p 2 -t [splicing_variants_transcripts.fa] -i [salmon_index]
amitfenn commented 4 years ago

Hi Luca,

I'm not sure what's going wrong anymore.. I think the first error has disappeared since I updated the PATH variable to include {galig_repo}/salmon/bin/

here's the output of salmon since.

salmon index -p 2 -t ./splicing_variants_transcripts.fa -i /myvol1/asgal_results/salmon/salmon_index/
Version Info: Could not resolve upgrade information in the alotted time.
Check for upgrades manually at https://combine-lab.github.io/salmon
[2020-05-12 15:22:26.894] [jLog] [info] building index
[2020-05-12 15:22:26.943] [jointLog] [info] [Step 1 of 4] : counting k-mers
[2020-05-12 15:22:28.521] [jointLog] [warning] Entry with header [ENSG00000281344_template] was longer than 200000 nucleotides.  Are you certain that we are indexing a transcriptome and not a genome?
[2020-05-12 15:22:29.909] [jointLog] [warning] Entry with header [ENSG00000249815_ir] was longer than 200000 nucleotides.  Are you certain that we are indexing a transcriptome and not a genome?
[2020-05-12 15:22:31.721] [jointLog] [warning] Entry with header [ENSG00000282431_template], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2020-05-12 15:22:35.241] [jointLog] [warning] Entry with header [ENSG00000196376_ir] was longer than 200000 nucleotides.  Are you certain that we are indexing a transcriptome and not a genome?
[2020-05-12 15:22:35.257] [jointLog] [warning] Entry with header [ENSG00000286540_ir] was longer than 200000 nucleotides.  Are you certain that we are indexing a transcriptome and not a genome?
[2020-05-12 15:22:35.631] [jointLog] [warning] Entry with header [ENSG00000237838_ir] was longer than 200000 nucleotides.  Are you certain that we are indexing a transcriptome and not a genome?
[2020-05-12 15:22:41.118] [jointLog] [warning] Entry with header [ENSG00000270961_template], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2020-05-12 15:22:41.124] [jointLog] [warning] Entry with header [ENSG00000270451_template], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2020-05-12 15:22:42.275] [jointLog] [warning] Entry with header [ENSG00000184226_ir] was longer than 200000 nucleotides.  Are you certain that we are indexing a transcriptome and not a genome?
[2020-05-12 15:22:42.566] [jointLog] [warning] Entry with header [ENSG00000258394_ir] was longer than 200000 nucleotides.  Are you certain that we are indexing a transcriptome and not a genome?
[2020-05-12 15:22:42.972] [jointLog] [warning] Entry with header [ENSG00000211909_template], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2020-05-12 15:22:42.972] [jointLog] [warning] Entry with header [ENSG00000227196_template], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2020-05-12 15:22:42.972] [jointLog] [warning] Entry with header [ENSG00000211915_template], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2020-05-12 15:22:42.972] [jointLog] [warning] Entry with header [ENSG00000227800_template], had length less than the k-mer length of 31 (perhaps after poly-A clipping)                                                                                                            
[2020-05-12 15:22:42.972] [jointLog] [warning] Entry with header [ENSG00000211920_template], had l                                                                                ength less than the k-mer length of 31 (perhaps after poly-A clipping)                                                                                                            
[2020-05-12 15:22:42.972] [jointLog] [warning] Entry with header [ENSG00000211921_template], had l                                                                                ength less than the k-mer length of 31 (perhaps after poly-A clipping)                                                                                                            
[2020-05-12 15:22:42.972] [jointLog] [warning] Entry with header [ENSG00000232543_template], had l                                                                                ength less than the k-mer length of 31 (perhaps after poly-A clipping)                                                                                                            
[2020-05-12 15:22:42.972] [jointLog] [warning] Entry with header [ENSG00000237197_template], had l                                                                                ength less than the k-mer length of 31 (perhaps after poly-A clipping)                                                                                                            
[2020-05-12 15:22:42.972] [jointLog] [warning] Entry with header [ENSG00000233655_template], had l                                                                                ength less than the k-mer length of 31 (perhaps after poly-A clipping)                                                                                                            
[2020-05-12 15:22:42.975] [jointLog] [warning] Entry with header [ENSG00000254045_template], had l                                                                                ength less than the k-mer length of 31 (perhaps after poly-A clipping)                                                                                                            
Elapsed time: 16.632s                                                                                                                                                             

[2020-05-12 15:22:43.575] [jointLog] [warning] Removed 2120 transcripts that were sequence duplicates of indexed transcripts.                                                                                                                                                       
[2020-05-12 15:22:43.575] [jointLog] [warning] If you wish to retain duplicate transcripts, please                                                                                 use the `--keepDuplicates` flag                                                                                                                                                  
[2020-05-12 15:22:43.632] [jointLog] [info] Replaced 3 non-ATCG nucleotides                                                                                                       
[2020-05-12 15:22:43.632] [jointLog] [info] Clipped poly-A tails from 619 transcripts
[2020-05-12 15:22:43.703] [jointLog] [info] Building rank-select dictionary and saving to disk
[2020-05-12 15:22:43.727] [jointLog] [info] done
Elapsed time: 0.023286s
[2020-05-12 15:22:44.074] [jointLog] [info] Writing sequence data to file . . . 
[2020-05-12 15:22:44.251] [jointLog] [info] done
Elapsed time: 0.177109s
[2020-05-12 15:22:46.453] [jointLog] [info] Building 32-bit suffix array (length of generalized text is 344391509)
[2020-05-12 15:22:47.500] [jointLog] [info] Building suffix array . . . 
success
saving to disk . . . done
Elapsed time: 0.740055s
done
Elapsed time: 42.1645s
processed 344000000 positions[2020-05-12 15:26:08.570] [jointLog] [info] khash had 142280847 keys
[2020-05-12 15:26:09.233] [jointLog] [info] saving hash to disk . . .                                                                                                             
[2020-05-12 15:26:19.516] [jointLog] [info] done                                                                                                                                  
Elapsed time: 10.2826s                                                                                                                                                            
[2020-05-12 15:26:33.946] [jLog] [info] done building index 

. . . . and the output files from running asgal, which works okayish, i think:

I have no name!@1a2dc4d0e695:/myvol1$ ls -lhtr ./asgal_results/
total - -G (edited to remove older files)
drwxr-sr-x 4 1491850500 1491900551 4.0K May 12 15:12 salmon
drwxr-sr-x 2 1491850500 1491900551 4.0K May 12 15:12 samples
drwxr-sr-x 3 1491850500 1491900551 4.0K May 12 15:12 logs
drwxr-sr-x 2 1491850500 1491900551 1.5M May 12 15:12 annos
drwxr-sr-x 2 1491850500 1491900551 4.0K May 12 15:12 ASGAL
I have no name!@1a2dc4d0e695:/myvol1$ ls -lhtr ./asgal_results/ASGAL/
total 708K
-rw-r--r-- 1 1491850500 1491900551 280K May 12 15:12 ENSG00000223972.mem
-rw-r--r-- 1 1491850500 1491900551 259K May 12 15:12 ENSG00000237491.mem
-rw-r--r-- 1 1491850500 1491900551 134K May 12 15:12 ENSG00000279928.mem
-rw-r--r-- 1 1491850500 1491900551  468 May 12 15:12 ENSG00000236397.mem
-rw-r--r-- 1 1491850500 1491900551  25K May 12 15:12 ENSG00000233614.mem
I have no name!@1a2dc4d0e695:/myvol1$ ls -lhtr ./asgal_results/logs/
total 24K
-rw-r--r-- 1 1491850500 1491900551  16K May 12 15:11 salmon_index.log
-rw-r--r-- 1 1491850500 1491900551 3.2K May 12 15:12 salmon_quant.log
-rw-r--r-- 1 1491850500 1491900551    0 May 12 15:12 samtools.log
drwxr-sr-x 2 1491850500 1491900551 4.0K May 12 15:12 ASGAL
I have no name!@1a2dc4d0e695:/myvol1$ 

However, I don't see any CSV files


Furthermore.. For a single run as well, we don't get any CSV files.

asgal --multi -g Homo_sapiens.GRCh38.dna.primary_assembly.fa -a splicing_variants.gtf -s test_1.fastq -s2 test_2.fastq -t splicing_variants_transcripts.fa -o asgal_results

In the asgal_results/logs/ASGAL - there was one log file named after a gene. Inside it:

Traceback (most recent call last): File "/opt/galig/scripts/detectEvents.py", line 6, in from Bio import SeqIO ModuleNotFoundError: No module named 'Bio'

In the asgal_results/ASGAL folder there was only mem file, but no events.csv.


Do you think we're using ASGAL wrong?

ldenti commented 4 years ago

It seems that you don't have biopython installed. But from your first message, it seems that you installed it... Can you import the Bio module from the python3 shell?

amitfenn commented 4 years ago

I guess I should have taken a closer look at that error message, Sorry to have bothered you again with this Luca, And thanks for double checking on BioPython. I thought i had it, but I was mistaken.

I've updated my Dockerfile and it seems to work for me now. I've also attached a dockerfile that pulls directly from dockerhub and it should work just fine. Feel free to share it with those who might need this tool in a docker.

Dockerfile-asgal.txt

Thank you for all your support, Luca

gdv commented 4 years ago

Dear Amit, thanks for the Dockerfile you provided. I would like to know if you have tried our Dockerfile or if you have written it from scratch.

amitfenn commented 4 years ago

FACEPALM.... I wrote it from scratch, when I didn't notice anything on the README. I should have paid more attention to your repo. I think your Dockerfile is more elegantly made. It would have saved me a lot of time.

Thanks for pointing it out.

gdv commented 4 years ago

This means that we have to put some notice in the README.

Have a nice day