Dill-PICL / GOMAP-singularity

GOMAP-Singularity is the containerized version of GOMAP
http://gomap.blunderingbioinformatics.org/
MIT License
11 stars 5 forks source link

Got error message in the mixmeth step #23

Closed wdlingit closed 3 years ago

wdlingit commented 3 years ago

Describe the bug Tried the test step as described in https://bioinformapping.com/gomap/master/RUNNING.html and got error message in the mixmeth step: requests.exceptions.ConnectionError: ('Connection aborted.', error(104, 'Connection reset by peer'))

Input File The FASTA in downloaded test folder

GOMAP step that crashed (if applicable) mixmeth

Attach the output files (following https://bioinformapping.com/gomap/v1.3.5/RUNNING.html)

wdlin@galaxy:/RAID3/Projects/20210826_MASK/TEST$ module list

Currently Loaded Modules:
  1) GCCcore/8.3.0   2) zlib/1.2.11-GCCcore-8.3.0   3) binutils/2.32-GCCcore-8.3.0   4) GCC/8.3.0   5) Singularity/3.5.3-GCC-8.3.0   6) LibUUID/1.0.3-GCCcore-8.3.0

(following https://bioinformapping.com/gomap/v1.3.5/RUNNING.html)
wdlin@galaxy:/RAID3/Projects/20210826_MASK/TEST$ git clone https://github.com/Dill-PICL/GOMAP-singularity.git .

wdlin@galaxy:/RAID3/Projects/20210826_MASK/TEST$ git checkout v1.3.5
Note: switching to 'v1.3.5'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at f490d4b added the latest setup file

wdlin@galaxy:/RAID3/Projects/20210826_MASK/TEST$ ./setup.sh

wdlin@galaxy:/RAID3/Projects/20210826_MASK/TEST$ cat test/config.yml
#Input section
input:
  #input fasta file name
  fasta: 0.3_GOMAP-input.fa
  # output file basename
  basename: 0.3_GOMAP-input
  #input NCBI taxonomy id
  taxon: "4577"
  # Name of the species
  species: "Zea mays"
  # Email is mandatory
  email: my@email.addr
  #Number of CPUs used for tools
  cpus: 4
  #Whether openmpi should be used
  mpi: False
  #what the name of the temporary directory is
  tmpdir: "/tmpdir"
  # These are for testing purposes only. Remove them for running on a genome dataset
  # These chnunks are too small for annotations when the number of genes exceed 500
  num_seqs: 30
  small_seqs: 10

wdlin@galaxy:/RAID3/Projects/20210826_MASK/TEST$ srun test.sh
(above message seems OK)
Running mixed-method based annotations
Submitting 0.3_GOMAP-input.1.tsv.zip and 0.3_GOMAP-input.1.hmm.out.zip to Argot2.5
Traceback (most recent call last):
  File "./gomap.py", line 93, in <module>
    run_mixmeth(config)
  File "/opt/GOMAP/code/gomap_mixmeth.py", line 31, in run_mixmeth
    submit_argot2(config)
  File "/opt/GOMAP/code/pipeline/run_argot2.py", line 176, in submit_argot2
    r_insert = s.post(argot_url,data=payload,files=files,headers=headers)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 559, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 512, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 622, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 495, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', error(104, 'Connection reset by peer'))
srun: error: node41: task 0: Exited with exit code 1

Intermediate outpul file (if applicable) FILE: logs/0.3_GOMAP-input-mixmeth.log INFO [2021-09-25 15:24] Starting to run the pipline for 0.3_GOMAP-input INFO [2021-09-25 15:24] Running mixed-method based annotations INFO [2021-09-25 15:24] Submitting the batch inputs to Argot2 INFO [2021-09-25 15:24] Submitting 0.3_GOMAP-input.1.tsv.zip and 0.3_GOMAP-input.1.hmm.out.zip to Argot2.5 DEBUG [2021-09-25 15:24] Starting new HTTP connection (1): www.medcomp.medicina.unipd.it:80 DEBUG [2021-09-25 15:24] http://www.medcomp.medicina.unipd.it:80 "POST /Argot2-5/form_batch.php HTTP/1.1" 200 5453

System Details

Additional context Actually I tried my dataset first and got the same error message at the same mixmeth step. So I tried the test case.

wdlingit commented 3 years ago

It is OK for me to manually submit files to Argot2.5 and put result files to some appropriate places for the final step of GOMAP. Please kindly let me know how to do it if this is possible.

wkpalan commented 3 years ago

Hey @wdlingit,

It should be fine doing the Argot2 step manually, but the server sometimes is unavailable. It might be worth trying again before running things manually.

I will try to update the documentation with the manual steps if Argot2.5 keeps failing.

Thanks

Best Kokul

wdlingit commented 3 years ago

By using my data and running test.sh, I had tried about 5 times (including one on a fresh new ubuntu16 VM). All failed at the same step. However the Argot2.5 seems usually available by using my desktop browser. Is it possible that the Argot2.5 server made some changes causing this?

wkpalan commented 3 years ago

I am running the test again now. It had worked day before yesterday. I will update once it has completed. If it fails, then I will update with manual running instructions.

  1. You will have to delete everything in the test directory and run git checkout test
  2. Add your email instead of mine to test/config.yaml file and re-run test.sh.
wkpalan commented 3 years ago

I just ran the test and it went through with no issues on my machine. It's a Windows machine with WSL2. I have also tried from the cluster and it does perform well.

wkpalan commented 3 years ago

The manual upload of Argot2.5 would be as follows for the test.

The files are located at test/GOMAP-0.3_GOMAP-input/tmp/mixed-meth/argot2.5

The blast directory contains

blast/0.3_GOMAP-input.1.tsv.zip
blast/0.3_GOMAP-input.2.tsv.zip

The hmmer directory contains

hmmer/0.3_GOMAP-input.1.hmm.out.zip
hmmer/0.3_GOMAP-input.2.hmm.out.zip

You can upload the zip files in a pairwise manner to Argot2 web server and download the results. The zip files downloaded should be as follows.

results/0.3_GOMAP-input.1.tsv.zip
results/0.3_GOMAP-input.2.tsv.zip

As long as these files are there GOMAP should complete without issue.

wdlingit commented 3 years ago

Thank you for the instruction. Sorry that I spent some time on other works. I retried the test.sh in a new ubuntu18 VM with singularity 3.5.2. git checkout v1.3.5, ./setup.sh, modified test/config.yml to be with my email, and ./test.sh.

For this time, got the same error message

Completed Running mixmeth-preproc step
Running mixed-method based annotations
Submitting 0.3_GOMAP-input.2.tsv.zip and 0.3_GOMAP-input.2.hmm.out.zip to Argot2.5
Submitting 0.3_GOMAP-input.1.tsv.zip and 0.3_GOMAP-input.1.hmm.out.zip to Argot2.5
Traceback (most recent call last):
  File "./gomap.py", line 93, in <module>
    run_mixmeth(config)
  File "/opt/GOMAP/code/gomap_mixmeth.py", line 31, in run_mixmeth
    submit_argot2(config)
  File "/opt/GOMAP/code/pipeline/run_argot2.py", line 176, in submit_argot2
    r_insert = s.post(argot_url,data=payload,files=files,headers=headers)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 559, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 512, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 622, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 495, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', error(104, 'Connection reset by peer'))

BUT got "Job submitted" email from argot2.5 server and then "Job COMPLETED!" email about a few minutes later. So I used "./test.sh aggregate" for the final test step and it seems OK. To be safe, I copied test/0.3_GOMAP-input.fa to the GOMAP directory for another test by treating the test fasta as an usual input.

ubuntu@singularity:~/GOMAP1$ cp test/0.3_GOMAP-input.fa .

ubuntu@singularity:~/GOMAP1$ cat min-config.yml
#Input section
input:
  #input fasta file name
  fasta: 0.3_GOMAP-input.fa
  # output file basename
  basename: test2
  #input NCBI taxonomy id
  taxon: "4577"
  # Name of the species
  species: "Zea mays"
  # Email is mandatory
  email: my@email.addr
  #Number of CPUs used for tools
  cpus: 4
  #Whether openmpi should be used
  mpi: False
  #what the name of the temporary directory is
  tmpdir: "/tmpdir"

(this perl onelineer is to generate commands of GOMAP steps: seqsim domain fanngo mixmeth-blast mixmeth-preproc mixmeth)
ubuntu@singularity:~/GOMAP1$ echo "seqsim domain fanngo mixmeth-blast mixmeth-preproc mixmeth" | perl -ne 'if($.==1){ $msg=`ls min-config.yml`; chomp $msg; @files=split(/\s+/,$msg) } chomp; @t=split; for $x (@t){ for $f (@files){ $f=~/_(\d+)\./; $cmd="./run-GOMAP-SINGLE.sh --step=$x --config=$f"; print "\nCMD: $cmd\n"; system $cmd } }'

One day later, I got the same error message and NO notifications from argot2.5 server.

CMD: ./run-GOMAP-SINGLE.sh --step=seqsim --config=min-config.yml
(seems OK, log omitted)

CMD: ./run-GOMAP-SINGLE.sh --step=domain --config=min-config.yml
(seems OK, log omitted)

CMD: ./run-GOMAP-SINGLE.sh --step=fanngo --config=min-config.yml
(seems OK, log omitted)

CMD: ./run-GOMAP-SINGLE.sh --step=mixmeth-blast --config=min-config.yml
(seems OK, log omitted)

CMD: ./run-GOMAP-SINGLE.sh --step=mixmeth-preproc --config=min-config.yml
(seems OK, log omitted)

CMD: ./run-GOMAP-SINGLE.sh --step=mixmeth --config=min-config.yml
/tmp:/tmp,/home/ubuntu/GOMAP1:/workdir,/home/ubuntu/GOMAP1/tmp:/tmpdir,
Running GOMAP --step=mixmeth --config=min-config.yml
Running mixed-method based annotations
Submitting test2.1.tsv.zip and test2.1.hmm.out.zip to Argot2.5
Traceback (most recent call last):
  File "./gomap.py", line 93, in <module>
    run_mixmeth(config)
  File "/opt/GOMAP/code/gomap_mixmeth.py", line 31, in run_mixmeth
    submit_argot2(config)
  File "/opt/GOMAP/code/pipeline/run_argot2.py", line 176, in submit_argot2
    r_insert = s.post(argot_url,data=payload,files=files,headers=headers)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 559, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 512, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 622, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 495, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', error(104, 'Connection reset by peer'))

So I manually upload test2.1.tsv.zip and test2.1.hmm.out.zip to argot2.5 and download the result file "result.zip". Placed result.zip in argot2.5/results directory and renamed it as test2.1.tsv.zip. The aggregate step got some error message

ubuntu@singularity:~/GOMAP1$ ./run-GOMAP-SINGLE.sh --step=aggregate --config=min-config.yml
/tmp:/tmp,/home/ubuntu/GOMAP1:/workdir,/home/ubuntu/GOMAP1/tmp:/tmpdir,
Running GOMAP --step=aggregate --config=min-config.yml
Running Aggregate Step
[1] "Reading the input file"
[1] "Converting to GAF 2.0"
[1] "Checking if data/data/go/go.obo.data exists"
[1] "data/data/go/go.obo.data exists so loading R object"
   user  system elapsed
  3.903   0.386   4.300
[1] "Writing the outfile"
[1] "Reading the input file"
[1] "Converting to GAF 2.0"
Error in .(QueryId, GO_class, Score) : could not find function "."
Calls: pannzer2gaf
Execution halted
Traceback (most recent call last):
  File "./gomap.py", line 103, in <module>
    aggregate(config)
  File "/opt/GOMAP/code/gomap_aggregate.py", line 29, in aggregate
    mixed2gaf(config)
  File "/opt/GOMAP/code/pipeline/mixed2gaf.py", line 8, in mixed2gaf
    check_output_and_run("test.pod",command)
  File "/opt/GOMAP/code/utils/basic_utils.py", line 24, in check_output_and_run
    subprocess.check_call(command,stdin=stdin_file,stdout=stdout_file)
  File "/usr/lib/python2.7/subprocess.py", line 190, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['Rscript', 'code/pipeline/mixed2gaf.r', '/workdir/GOMAP-test2/test2.all.yml']' returned non-zero exit status 1

ubuntu@singularity:~/GOMAP1$ cat GOMAP-test2/logs/test2-aggregate.log
INFO [2021-10-17 04:24] Starting to run the pipline for test2
INFO [2021-10-17 04:24] Obtaining and aggregating Argot2.5 results
INFO [2021-10-17 04:24] Unzipping test2.1.tsv.zip
INFO [2021-10-17 04:24] Filtering mixed-method GAF
INFO [2021-10-17 04:24] test.pod not present so running command
Rscript code/pipeline/mixed2gaf.r /workdir/GOMAP-test2/test2.all.yml

Did some file list checking between the two tests. test1: made by test.sh , test2: made with the test fasta as an usual input.

(file list diff finding)
ubuntu@singularity:~/GOMAP1$ find GOMAP-test2/ | sort | perl -ne 'chomp; /.+?\/(.+)$/; print "$1\n"' > test2.filelist.1
ubuntu@singularity:~/GOMAP1$ find test/GOMAP-0.3_GOMAP-input/ | sort | perl -ne 'chomp; /.+?\/.+?\/(.+)$/; print "$1\n"' > test1.filelist.1

test1 got files in gaf/a.mm_gaf/ gaf/b.raw_gaf/ gaf/c.uniq_gaf/ gaf/d.non_red_gaf/ gaf/e.agg_data/, and test2 got nothing for gaf/c.uniq_gaf/ gaf/d.non_red_gaf/ gaf/e.agg_data/. Also, no files in tmp/mixed-meth/pannzer/results/ for both test1 and test2. I checked this because the last message before halted is "Calls: pannzer2gaf". Could this be related? Or I just wronly did some step? Thank you for your patience.

wkpalan commented 3 years ago

Hey @wdlingit,

This is puzzling to me. The error with argot2.5 shown below indicates that Argot server is declining the connection for some reason. You have solved it with manual upload and that should take care of that issue.

requests.exceptions.ConnectionError: ('Connection aborted.', error(104, 'Connection reset by peer'))

Seems like there is a connection issue from singularity and Argot2.5, and I am not sure how.

The log file at GOMAP-singularity/test/GOMAP-0.3_GOMAP-input/logs/0.3_GOMAP-input-mixmeth.log should say te following.

INFO [2021-10-17 13:02] Starting to run the pipline for 0.3_GOMAP-input
INFO [2021-10-17 13:02] Running mixed-method based annotations
INFO [2021-10-17 13:02] Submitting the batch inputs to Argot2
INFO [2021-10-17 13:02] Submitting 0.3_GOMAP-input.2.tsv.zip and 0.3_GOMAP-input.2.hmm.out.zip to Argot2.5
DEBUG [2021-10-17 13:02] Starting new HTTP connection (1): www.medcomp.medicina.unipd.it:80
DEBUG [2021-10-17 13:02] http://www.medcomp.medicina.unipd.it:80 "POST /Argot2-5/form_batch.php HTTP/1.1" 200 5453
DEBUG [2021-10-17 13:02] http://www.medcomp.medicina.unipd.it:80 "POST /Argot2-5/insert_batch.php HTTP/1.1" 200 1432
INFO [2021-10-17 13:02] Submitting 0.3_GOMAP-input.1.tsv.zip and 0.3_GOMAP-input.1.hmm.out.zip to Argot2.5
DEBUG [2021-10-17 13:02] Starting new HTTP connection (1): www.medcomp.medicina.unipd.it:80
DEBUG [2021-10-17 13:02] http://www.medcomp.medicina.unipd.it:80 "POST /Argot2-5/form_batch.php HTTP/1.1" 200 5453
DEBUG [2021-10-17 13:02] http://www.medcomp.medicina.unipd.it:80 "POST /Argot2-5/insert_batch.php HTTP/1.1" 200 1439
INFO [2021-10-17 13:02] Running Pannzer
INFO [2021-10-17 13:02] /workdir/test/GOMAP-0.3_GOMAP-input/tmp/mixed-meth/pannzer/results/0.3_GOMAP-input.2_results.GO not present so running command
python run.py /workdir/test/GOMAP-0.3_GOMAP-input/tmp/mixed-meth/pannzer/conf/0.3_GOMAP-input.2.conf
INFO [2021-10-17 13:03] Step completed
INFO [2021-10-17 13:03] /workdir/test/GOMAP-0.3_GOMAP-input/tmp/mixed-meth/pannzer/results/0.3_GOMAP-input.1_results.GO not present so running command
python run.py /workdir/test/GOMAP-0.3_GOMAP-input/tmp/mixed-meth/pannzer/conf/0.3_GOMAP-input.1.conf
INFO [2021-10-17 13:03] Step completed

It seems odd that PANNZER is not produciung any output. The test output should be what we see below in the pannzer.

cd GOMAP-singularity/test/GOMAP-0.3_GOMAP-input/tmp/mixed-meth/pannzer && find -type f

./results/0.3_GOMAP-input.2.clusters
./results/0.3_GOMAP-input.2_results.DE
./results/0.3_GOMAP-input.1_results.DE
./results/0.3_GOMAP-input.1.clusters
./results/0.3_GOMAP-input.2_results.GO
./results/0.3_GOMAP-input.1_results.GO
./conf/0.3_GOMAP-input.2.conf
./conf/0.3_GOMAP-input.1.conf
./blast/0.3_GOMAP-input.2.xml
./blast/0.3_GOMAP-input.1.xml

Do you want to setup a time to talk on a call to figure this out?

Please contact me at kokul@bioinformapping.com if that would work.

Best Kokul

yxl8241 commented 3 years ago

I got the same error. But when I had a test run on source code downloaded from GOMAP repo, I found that those requests can be sent successfully after removing the content type of file object which is specified as 'text/plain' (at line# 145, 146 on run_argot2.py).

wdlingit commented 3 years ago

Just want things to be clear. yxl8241 should be my colleague. I am testing what yxl8241 said with a modified GOMAP sif.

wdlingit commented 3 years ago

Just tested with the modified GOMAP sif (with suggestions by yxl8241) twice: (i) test3: running test.sh and (ii) test4: running GOMAP steps with the test fasta as a regular input. All successful in all steps. Some minor differences between final outputs (aggregate.gaf) but I think that might be due to search results made by blast with split query files or not (which usually means different e-values).

Back to test1, for which I got one argot2.5 job submission email and one complete email, and no result files in pannzer folders. This is explainable (and my thanks to you and yxl8241): with test.sh, the input fasta was splitted into two files for processing, submission of the first one is OK and the second one failed. The mixmeth step simply stopped when some argot2.5 submission failed so the pannzer part was not invoked. Accordingly, for test3, I did recieve two job submission emails and two job complete emails. Also got result files in pannzer folders.

It seems puzzling to me that the same code can sometimes successfully submit files to the argot2.5 server (with xen VM, WSL VM, and real server). It also seems to me that the changes suggested by yxl8241 just work.

wkpalan commented 3 years ago

Hey @yxl8241 and @wdlingit,

  1. Thank you for the detailed testing and troubleshooting. I appreciate the effort for this including testing based on the source code.
  2. @yxl8241, could you please send a PR with the changes that fixed the error for you to https://github.com/Dill-PICL/GOMAP? I can updated the code, but I would like to give you credit for fixing the bug. Let me know if that's too much work and I will push the changes myself and add your name to the README.
  3. @wdlingit, I agree this seems like a weird bug to me too. I tested this on WSL, Azure VM, and the ISU HPC cluster. Seemed to work for me, but I think this is because I have commited the Argot2.5 files. My current check is the email notification for job submission and job completion. I have not had any issues with that so far. I am glad that the current fix works. Good luck with annotations.
yxl8241 commented 3 years ago

Just made a PR in source code repository. I hope did it right though.

wkpalan commented 3 years ago

Hey @yxl8241 ,

Could you please send the PR to the dev branch? I usually test it in the dev branch and then merge it to master. Unfortunately GH doesn't let me change the branch after you initiated the PR.

Best Kokul

yxl8241 commented 3 years ago

I just made another attempt. Please let me know if it doesn't work.

wkpalan commented 3 years ago

Thanks @yxl8241, that works.