Open kirasato0211 opened 3 years ago
Hi, can you please send the tree view of the output folder (command tree
is good enough)? Moreover, where did you get the gtf from? Can you share it?
Hi, Below is a tree view of output folder.
.
|-- ASGAL
| |-- ENSG00000105976.events.csv
| |-- ENSG00000105976.mem
| |-- ENSG00000105976.sam
| |-- ENSG00000146648.events.csv
| |-- ENSG00000146648.mem
| `-- ENSG00000146648.sam
|-- ASGAL.csv
|-- annos
| |-- ENSG00000105976.gtf
| |-- ENSG00000105976.gtf.db
| |-- ENSG00000105976.gtf.sg
| |-- ENSG00000146648.gtf
| |-- ENSG00000146648.gtf.db
| `-- ENSG00000146648.gtf.sg
|-- logs
| |-- ASGAL
| | |-- ENSG00000105976
| | `-- ENSG00000146648
| |-- salmon_index.log
| |-- salmon_quant.log
| `-- samtools.log
|-- refs
| `-- chr7.fa
|-- salmon
| |-- salmon.bam
| |-- salmon.bam.bai
| |-- salmon_index
| | |-- duplicate_clusters.tsv
| | |-- hash.bin
| | |-- header.json
| | |-- indexing.log
| | |-- quasi_index.log
| | |-- refInfo.json
| | |-- rsd.bin
| | |-- sa.bin
| | |-- txpInfo.bin
| | `-- versionInfo.json
| `-- salmon_out
| |-- aux_info
| | |-- ambig_info.tsv
| | |-- expected_bias.gz
| | |-- fld.gz
| | |-- meta_info.json
| | |-- observed_bias.gz
| | |-- observed_bias_3p.gz
| | `-- unmapped_names.txt
| |-- cmd_info.json
| |-- libParams
| | `-- flenDist.txt
| |-- lib_format_counts.json
| |-- logs
| | `-- salmon_quant.log
| `-- quant.sf
`-- samples
|-- ENSG00000105976.fa
`-- ENSG00000146648.fa
12 directories, 45 files
I have fetched the annotations for MET and EGFR from gencode annotations and give it as input for ASGAL command.
Can you check if the .fa files in the samples directory and the .gtf files in the annos directory are empty or not? Moreover, can you share the salmon.bam
file?
I can see the .fa files in the samples directory are non-empty but one of the file in the annos directory is empty and other one is non-empty. Please find attached here zip file which contains input annotations file(annotation2.gtf), the .gtf files from annos directory and the salmon.bam file. Attachment.zip
In annotation2.gtf
both the transcripts of each gene share the same transcript_id
. This produces unpredictable behaviours (gffutils
library combines the two trancripts in a single transcript).
I manually edited the annotations (you can find it here) and now asgal output is non empty.
Let me know if this new annotation fixed your problem.
Thank you for pointing it out. It helped me to run ASGAL successfully. However, there is no event reported in the output. I am expecting MET exon 14 skipping event to be reported in the output. I am sure that the samples i am using have MET14 deletion event and the EGFR variation.
Please have a look at the modified annotation file i am using. annotation.zip If i am interested in MET exon 14 skipping event to be reported in the output, do i need to mention exon 14 annotations in gtf file?
Could you please help me to understand how ASGAL report events?
mmm that's strange (when I used the salmon.bam
you shared with me, I found 2 events). So there must be some other issue.
Are the .sam
files empty?
(if you haven't done it yet) I suggest removing the .gtf.db
files in the annos folder since they may refer to the old (incomplete) annotations and rerun (or you can just change output folder).
ASGAL reports a csv describing the events: one line per event with type of event (e.g. ES for exon skipping), genomic coordinates, and other information (the example you can find here describes better the output format).
could you please let me know how you found 2 events using salmon.bam file? SAM files are not empty. I tried removing .gtf.db files in the annos folder and also tried to change output folder. Unfortunately, it didnt help me. Still there are no events reported in the output file.
I created the two samples using
samtools fastq -1 sample_1.fq -2 sample_2.fq salmon.bam
and then I ran asgal (as you did) on the edited annotation I sent you.
I checked the annotation you sent me and it contains only 1 transcript per gene (I'm using the initial one with 2 transcripts per gene). I think this is the reason why you are not getting any event.
I run ASGAL with the edited annotation file you have sent to me and it didnt work. Then i used the salmon.bam file to generate the samples as you mentioned in above command and then run ASGAL. Unfortunately, it didnt help me.
Please send me the commands you have used to get the output.
I used these files (annotation, transcripts, and samples). The reference I used is chr7 only, downloaded from ensembl (link). Edit: you have to change the header of the fasta entry from >7
to >chr7
otherwise asgal crashes.
I then ran:
asgal --multi -g Homo_sapiens.GRCh38.dna.chromosome.7.fa -a annotation2.edit.gtf -s sample_1.fq -s2 sample_2.fq -t transcripts2.fa --allevents -o output
and I obtained these events:
Type,Start,End,Support,Transcripts,file
ES,55019366,55155829,16,ENST00000275493.7,output/ASGAL/ENSG00000146648.events.csv
ES,116771655,116774880,24,ENST00000318493.11,output/ASGAL/ENSG00000105976.events.csv
Let me know if these files work for you
Thank You. These files worked for me. The genome file i was using earlier has chr7 in the header. However it has repeat masked bases and 50 bases per line. May be that was causing issue.
Thank you for helping in this case. I have tested it on positive samples and will soon test it on negative samples too.
Hi, please notice that the official asgal release (not the one on the agilent hub) is now at version 1.1.2. May you please try that version? It should suffice to run
sudo docker run -v "$PWD"/asgalgw_data:/data registry-dev.scs.agilent.com/algolab/galig:v1.1.1
Best
could you please let me know from where i can get the latest asgal release (version 1.1.2)?
Sorry, I did not update the command. It should be
sudo docker run -v "$PWD"/asgalgw_data:/data algolab/asgal
The latest docker image is at https://hub.docker.com/r/algolab/asgal
Best regards
Thank you. I am able to run ASGAL using this image. However, came across below error.
subprocess.CalledProcessError: Command '['/galig/bin/SpliceAwareAligner', '-g', '/data/output/refs/chr7.fa', '-a', '/data/output/annos/ENSG00000105976.gtf', '-s', '/data/output/samples/ENSG00000105976.fa', '-l', '15', '- e', '3', '-o', '/data/output/ASGAL/ENSG00000105976.mem']' died with <Signals.SIGILL: 4>. """
could you please let me know where its going wrong?
could you please let me know why i am getting below error while running ASGAL tool in docker?
subprocess.CalledProcessError: Command '['/galig/bin/SpliceAwareAligner', '-g', '/data/output/refs/chr7.fa', '-a', '/data/output/annos/ENSG00000105976.gtf', '-s', '/data/output/samples/ENSG00000105976.fa', '-l', '15', '- e', '3', '-o', '/data/output/ASGAL/ENSG00000105976.mem']' died with <Signals.SIGILL: 4>.
Are you using the example data?
No. I am using own dataset.
Did it work on the example data? Can you also please send the entire log output to stderr?
I received same error on example data also. I used data from example/input for test. please find attached here the error message from console.
I am unable to upload the error file here. Hence copying the console log here.
ubuntu@rnaseq-asgal1:~$ sudo docker run -v "$PWD"/asgalgwdata:/data algolab/asgal:v1.1.2 Starting with UID:GID 0:0 [ Mar 11, 2021 - 9:12:13AM ] args Namespace(allevents=False, annoPath='/data/annotation.gtf', debug=False, e='3', l='15', multiMode=False, outputPath='/data/output', refPath='/data/genome.fa', sample1Path='/data/sample 1.fa', sample2Path='-', split_only=False, threads='2', transPath='-', verbose=False, w='3') [ Mar 11, 2021 - 9:12:13AM ] Opening input annotation... [ Mar 11, 2021 - 9:12:13AM ] Indexing... [ Mar 11, 2021 - 9:12:13AM ] Reading input annotation... [ Mar 11, 2021 - 9:12:13AM ] number of genes 1 [##################################################] 1/1 [ Mar 11, 2021 - 9:12:13AM ] Done. [ Mar 11, 2021 - 9:12:13AM ] Running ASGAL on 1 gene... multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, *kwds)) File "/usr/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(args)) File "/galig/asgal", line 390, in asgal_command_one_gene command_check_return(asgal_CMDs['align'], log, log, verbose=args.verbose) File "/galig/asgal", line 62, in command_check_return completed_process.check_returncode() File "/usr/lib/python3.6/subprocess.py", line 389, in check_returncode self.stderr) subprocess.CalledProcessError: Command '['/galig/bin/SpliceAwareAligner', '-g', '/data/genome.fa', '-a', '/data/annotation.gtf', '-s', '/data/sample_1.fa', '-l', '15', '-e', '3', '-o', '/data/output/sample_1-FBgn0040370. mem']' died with <Signals.SIGILL: 4>. """
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/galig/asgal", line 585, in
I tried the example data with a clean installation of docker and asgal:v1.1.2 and it worked. Maybe older versions are creating some issue? Can you please remove/delete all old asgal images and try again?
Moreover, since v1.1.1 worked for you in december: are you using a different machine or os? Have you update docker recently?
I have tried command line ASGAL tool for few samples in december and that worked for me. now i want to try it on Docker so that i can run it on more samples. The machine i have used and now i am using are different but OS is same .i.e ubuntu 18.04.
I have installed docker on the machine where command line ASGAL was running successfully. Even on that machine i have came across same error.
[ Mar 12, 2021 - 6:06:14AM ] Running ASGAL on 2 genes... multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, *kwds)) File "/usr/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(args)) File "/galig/asgal", line 390, in asgal_command_one_gene command_check_return(asgal_CMDs['align'], log, log, verbose=args.verbose) File "/galig/asgal", line 62, in command_check_return completed_process.check_returncode() File "/usr/lib/python3.6/subprocess.py", line 389, in check_returncode self.stderr) subprocess.CalledProcessError: Command '['/galig/bin/SpliceAwareAligner', '-g', '/data/output/refs/chr7.fa', '-a', '/data/output/annos/ENSG00000146648.gtf', '-s', '/data/output/samples/ENSG00000146648.fa', '-l', '15', '-e', '3', '-o', '/data/output/ASGAL/ENSG00000146648.mem']' died with <Signals.SIGSEGV: 11>. """
please let me know if you can help me in this case.
How much RAM does the machine have? Can you share with us the files that you have used?
RAM : 16 GB
i am unable to upload files here. it would be great if you can use the files attached to previous comments on this thread. i am using same for current run.
I used the same files I linked in https://github.com/AlgoLab/galig/issues/12#issuecomment-740812653 and it worked. After setting up the inputs I ran:
docker run -v "$PWD"/2genes:/data algolab/asgal:v1.1.2
Can you please send here:
docker -v
docker info
docker images -a | grep asgal
logs
folder you can find in the output
folder created inside the folder containing the input you are usingthe output of docker -v
Docker version 19.03.6, build 369ce74a3c
sudo docker info
Client: Debug Mode: false
Server: Containers: 2 Running: 0 Paused: 0 Stopped: 2 Images: 1 Server Version: 19.03.6 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: runc version: init version: Security Options: apparmor seccomp Profile: default Kernel Version: 4.15.0-136-generic Operating System: Ubuntu 18.04 LTS OSType: linux Architecture: x86_64 CPUs: 4 Total Memory: 15.17GiB Name: rnaseq-asgal1 ID: VLPG:C3ZT:P3Z7:JMY2:WJCN:VNM5:ZDRJ:BFIZ:7UNI:3AZA:OUAW:R6T3 Docker Root Dir: /var/lib/docker Debug Mode: false Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false
WARNING: No swap limit support
sudo docker images -a | grep asgal
algolab/asgal v1.1.2 f67ef7b50dd0 3 months ago 1.23GB
the logs folder you can find in the output folder created inside the folder containing the input you are using
~/asgalgw_data/output/logs$ ls -la total 20 drwxr-xr-x 3 root root 4096 Mar 11 06:33 . drwxr-xr-x 8 root root 4096 Mar 11 06:33 .. drwxr-xr-x 2 root root 4096 Mar 11 06:33 ASGAL -rw-r--r-- 1 root root 1215 Mar 11 06:30 salmon_index.log -rw-r--r-- 1 root root 3768 Mar 11 06:32 salmon_quant.log -rw-r--r-- 1 root root 0 Mar 11 06:32 samtools.log ~/asgalgw_data/output/logs$ cd ASGAL ~/asgalgw_data/output/logs/ASGAL$ ls -la total 8 drwxr-xr-x 2 root root 4096 Mar 11 06:33 . drwxr-xr-x 3 root root 4096 Mar 11 06:33 .. -rw-r--r-- 1 root root 0 Mar 11 06:33 ENSG00000105976 -rw-r--r-- 1 root root 0 Mar 11 06:33 ENSG00000146648
Is the likely culprit the fact that ENSG00000105976
is empty?
@kirasato0211 Just to exclude some obvious causes, the disk (or the disk quota) is not full, isn't it?
The only difference I see is that I have docker version 20.10.5 (build 55c4c88). I installed it from here. Did you install docker from official ubuntu repositories? Would it be possible for you to update docker?
Can you also send us the zipped log folder?
Just to exclude some obvious causes, the disk (or the disk quota) is not full, isn't it?
--> disk has enough space.
i am looking into installing docker version 20.10.5 and will get back to you.
i am unable to install docker version 20.10.5 on my machine. As mentioned earlier, i am getting error while attaching zipped file here.
I have installed docker version you have mentioned.
ubuntu@rnaseq-asgal1:~$ docker --version Docker version 20.10.5, build 55c4c88
I am still receiving same error as below.
ubuntu@rnaseq-asgal1:~$ sudo docker run -v "$PWD"/asgalgw_data1:/data algolab/asgal:v1.1.2 Changing user Starting with UID:GID 1000:1000 [ Mar 18, 2021 - 8:54:47AM ] args Namespace(allevents=False, annoPath='/data/annotation.gtf', debug=False, e='3', l='15', multiMode=True, outputPath='/data/output', refPath='/data/genome.fa', sample1Path='/data/sample_1 .fa', sample2Path='/data/sample_2.fa', split_only=False, threads='2', transPath='/data/transcripts.fa', verbose=False, w='3') [ Mar 18, 2021 - 8:54:47AM ] Opening input annotation... [ Mar 18, 2021 - 8:54:47AM ] Indexing... [ Mar 18, 2021 - 8:54:47AM ] Splitting input annotation... [ Mar 18, 2021 - 8:54:47AM ] number of genes 2 [##################################################] 2/2 [ Mar 18, 2021 - 8:54:48AM ] Done. [ Mar 18, 2021 - 8:54:48AM ] Splitting input reference... [ Mar 18, 2021 - 8:54:55AM ] Done. [ Mar 18, 2021 - 8:54:55AM ] Running Salmon indexing... [ Mar 18, 2021 - 8:54:55AM ] Done. [ Mar 18, 2021 - 8:54:55AM ] Running Salmon quasi-mapping on paired-end sample... [ Mar 18, 2021 - 8:56:09AM ] Done. [ Mar 18, 2021 - 8:56:09AM ] Retrieving unmapped reads... [ Mar 18, 2021 - 8:56:13AM ] Parsing /data/sample_1.fa to retrieve reads... [ Mar 18, 2021 - 8:56:43AM ] Parsing /data/sample_2.fa to retrieve reads... [ Mar 18, 2021 - 8:57:15AM ] Done. [ Mar 18, 2021 - 8:57:15AM ] Unmapped reads that will be remapped: 0 [ Mar 18, 2021 - 8:57:15AM ] Done. [ Mar 18, 2021 - 8:57:15AM ] Splitting Salmon BAM... [ Mar 18, 2021 - 8:57:15AM ] Done. [ Mar 18, 2021 - 8:57:15AM ] Running ASGAL on 2 genes... multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, *kwds)) File "/usr/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(args)) File "/galig/asgal", line 390, in asgal_command_one_gene command_check_return(asgal_CMDs['align'], log, log, verbose=args.verbose) File "/galig/asgal", line 62, in command_check_return completed_process.check_returncode() File "/usr/lib/python3.6/subprocess.py", line 389, in check_returncode self.stderr) subprocess.CalledProcessError: Command '['/galig/bin/SpliceAwareAligner', '-g', '/data/output/refs/chr7.fa', '-a', '/data/output/annos/ENSG00000105976.gtf', '-s', '/data/output/samples/ENSG00000105976.fa', '-l', '15', '- e', '3', '-o', '/data/output/ASGAL/ENSG00000105976.mem']' died with <Signals.SIGILL: 4>. """
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/galig/asgal", line 585, in
Please help me in this case. I appreciate your help in advance.
commnad line ASGAL works successfully from container on same dataset.
That's strange. Unfortunately it's not easy to troubleshoot this problem since it works on my machine...
Are you using the same data (I mean: exactly the same files) when running asgal via docker and from within the container? Can you please send us the output of
ls -l asgalgw_data1
?
Yes. i am using same dataset to run the command from container. Below is the console output of ASGAL command run from container.
root@21f9efea9953:/data# /galig/asgal -g genome.fa -a annotation.gtf -s sample_1.fa -s2 sample_2.fa -t transcripts.fa -o output/ [ Mar 19, 2021 - 10:44:07AM ] args Namespace(allevents=False, annoPath='annotation.gtf', debug=False, e='3', l='15', multiMode=False, outputPath='output/', refPath='genome.fa', sample1Path='sample_1.fa', sample2Path='sam ple_2.fa', split_only=False, threads='2', transPath='transcripts.fa', verbose=False, w='3') [ Mar 19, 2021 - 10:44:07AM ] Opening input annotation... [ Mar 19, 2021 - 10:44:07AM ] Reading input annotation... [ Mar 19, 2021 - 10:44:07AM ] number of genes 2 [##################################################] 2/2 [ Mar 19, 2021 - 10:44:07AM ] Done. [ Mar 19, 2021 - 10:44:07AM ] Running ASGAL on 1 gene... ^Z [1]+ Stopped /galig/asgal -g genome.fa -a annotation.gtf -s sample_1.fa -s2 sample_2.fa -t transcripts.fa -o output/ root@21f9efea9953:/data# /galig/asgal --multi -g genome.fa -a annotation.gtf -s sample_1.fa -s2 sample_2.fa -t transcripts.fa -o output/ [ Mar 19, 2021 - 10:49:02AM ] args Namespace(allevents=False, annoPath='annotation.gtf', debug=False, e='3', l='15', multiMode=True, outputPath='output/', refPath='genome.fa', sample1Path='sample_1.fa', sample2Path='samp le_2.fa', split_only=False, threads='2', transPath='transcripts.fa', verbose=False, w='3') [ Mar 19, 2021 - 10:49:02AM ] Opening input annotation... [ Mar 19, 2021 - 10:49:02AM ] Splitting input annotation... [ Mar 19, 2021 - 10:49:02AM ] number of genes 2 [##################################################] 2/2 [ Mar 19, 2021 - 10:49:02AM ] Done. [ Mar 19, 2021 - 10:49:02AM ] Splitting input reference... [ Mar 19, 2021 - 10:49:09AM ] Done. [ Mar 19, 2021 - 10:49:10AM ] Running Salmon indexing... [ Mar 19, 2021 - 10:49:10AM ] Done. [ Mar 19, 2021 - 10:49:10AM ] Running Salmon quasi-mapping on paired-end sample... [ Mar 19, 2021 - 10:50:24AM ] Done. [ Mar 19, 2021 - 10:50:24AM ] Retrieving unmapped reads... [ Mar 19, 2021 - 10:50:28AM ] Parsing sample_1.fa to retrieve reads... [ Mar 19, 2021 - 10:51:01AM ] Parsing sample_2.fa to retrieve reads... [ Mar 19, 2021 - 10:51:35AM ] Done. [ Mar 19, 2021 - 10:51:35AM ] Unmapped reads that will be remapped: 0 [ Mar 19, 2021 - 10:51:35AM ] Done. [ Mar 19, 2021 - 10:51:35AM ] Splitting Salmon BAM... [ Mar 19, 2021 - 10:51:35AM ] Done. [ Mar 19, 2021 - 10:51:35AM ] Running ASGAL on 2 genes... [ Mar 19, 2021 - 10:51:48AM ] Done. root@21f9efea9953:/data#
ubuntu@rnaseq-asgal1:~$ ls -l asgalgw_data1/ total 2447404 -rwxrwxrwx 1 ubuntu ubuntu 88896 Mar 18 09:45 annotation.gtf -rw-r--r-- 1 ubuntu ubuntu 249856 Mar 19 10:24 annotation.gtf.db -rwxrwxrwx 1 ubuntu ubuntu 162001746 Mar 18 09:45 genome.fa drwxr-xr-x 8 ubuntu ubuntu 4096 Mar 19 10:26 output -rwxrwxrwx 1 ubuntu ubuntu 1171878283 Mar 8 11:21 sample_1.fa -rwxrwxrwx 1 ubuntu ubuntu 1171878283 Mar 8 11:22 sample_2.fa -rwxrwxrwx 1 ubuntu ubuntu 17102 Mar 18 09:45 transcripts.fa
I have FASTQ samples. but docker need fa files so i convert samples from fastq to fa files using below command.
seqtk seq -a in.fastq.gz > out.fa
the command run from container used these converted fa files and commands run successfully on it. So ideally it should work using docker too.
I tested it on an old machine I have (docker 19.03.8) and I got the same error. So I can confirm that the problem is docker v19.03. Unfortunately we cannot easily troubleshoot this problem on that machine...
Is it a problem for you running asgal directly from the docker container?
I have uninstalled the docker version 19.03.8 and installed a version 20.10.5, build 55c4c88. Even after installing new version, i am facing same error.
Have you tried to purge asgal images and repull them?
I tried it before and also gave it a try once again. It didnt work for me.
ASGAL run successfully on the image created using the dockerfile present in the /galig/docker but it has identified events in example data and not in our test data. I am looking into it but i appreciate your help in this case.
Also, please let me know if there is a way that I can pass fastq sample files to ASGAL in docker instead of .fa samples.
Hi, thanks for letting us know that. It's strange that the image you created worked with no issue: did you use docker v20? Maybe there is something wrong with the image on docker hub - I really don't know what to say here.
Anyway, I updated the docker script to accept samples both in fasta (.fa) and fastq (.fq): I pushed a new docker image (v.1.1.3).
Can you try the new version and let us know? Thanks!
I received same error (the one i was facing earlier) with this image (v1.1.3) too.
I did changes in asgal-docker.sh to accept fastq samples and also added --allevents parameter in the commands.
After doing this change, ASGAL accepts the fastq files and also reported the events we are interested in.
I appreciate your help in this case.
Hi, I am using ASGAL tool to find MET14 deletion and EGFR variation events in the samples. I am running genome wide analysis for these two genes. ASGAL run is successful but I am getting .mem and sam file as empty and hence no events reported in the final files.
command I used is as below:
./asgal --multi -g genome.fa -a annotation2.gtf -s sample1.fastq.gz -s2 sample2.fastq.gz -t transcript.fa --allevents -o output
could you please help me in this case as soon as possible?