emo-bon / MetaGOflow

MGnify oriented implementation for the Marine Genomic Observatories oriented pipeline, developed in the framework of an EOSC-Life funded project
https://metagoflow.readthedocs.io
Apache License 2.0
7 stars 7 forks source link

Update of functional annotation software #48

Closed cymon closed 4 months ago

cymon commented 4 months ago

I've updated the version of IPS to 5.64-96.0 (Sept 23) and changed the Dockerfile so that IPS now runs on AMD EPYC cpus (Issue https://github.com/emo-bon/MetaGOflow/issues/45). While I was at it, I also updated EggNog and HMMR to their latest versions.

cymon commented 4 months ago

Hi Haris,

On Fri, 3 May 2024 at 16:26, Haris Zafeiropoulos @.***> wrote:

@.**** commented on this pull request.

Thanks for the PR @cymon https://github.com/cymon ! Great stuff! I think this PR really highlights the greatness of modern workflows!

My suggestion would be before merging to the eosc-life-gos branch that works as our main, to address the PR first to the develop branch ( @steninidak https://github.com/steninidak please make sure it's updated with the eosc-life-gos) so then Stelios can give it a shot in zorbas as well.

Yes, that makes more sense. I'll generate another PR asap.

The code looks fine but there is no way we can have an automated check on github for this wf.. so in order to check whether the new tools work fine, they should actually run somewhere.

In Installation/templates/default.yml https://github.com/emo-bon/MetaGOflow/pull/48#discussion_r1589331399:

@@ -188,7 +188,7 @@ ko_file:

InterProScan_applications:

  • Pfam -- TIGRFAM

Just out of curiosity, are those (TIGRFAM) included in NCBIfam? I found this https://www.ncbi.nlm.nih.gov/refseq/annotation_prok/tigrfams/ but I still do not get it.

On this page https://interproscan-docs.readthedocs.io/en/latest/HowToRun.html it says: NCBIFAM https://www.ncbi.nlm.nih.gov/genome/annotation_prok/evidence/ (includes the previous TIGRFAM http://www.jcvi.org/cgi-bin/tigrfams/index.cgi analysis). I think they have added other models and renamed it. "Other components of that collection are NCBIFAMs (models built from scratch by NCBI curators) and models derived from a curated collection of protein clusters (PRK)"


In tools/Assembly/EggNOG/Dockerfile https://github.com/emo-bon/MetaGOflow/pull/48#discussion_r1589336817:

RUN apk add --no-cache bash git build-base zlib-dev python3-dev cmake linux-headers RUN python3 -m pip install psutil biopython

install diamond

-RUN wget https://github.com/bbuchfink/diamond/archive/v$VERSION_DIAMOND.tar.gz && \

Just fyi, in Docker it is always a good practice to have RUN commands on several lines separated by \ ; see here https://docs.docker.com/develop/develop-images/instructions/#run.

Noted! Thanks.


In tools/InterProScan/Dockerfile https://github.com/emo-bon/MetaGOflow/pull/48#discussion_r1589342249:

@@ -70,41 +65,17 @@ COPY --from=buildcore /opt/interproscan /opt/interproscan COPY --from=buildbin /opt/interproscan/bin /opt/interproscan/bin

ENV PATH="/opt/interproscan/:/opt/interproscan/bin:${PATH}"

-RUN sed -i 's/http:\/\/www.ebi.ac.uk\/interpro\/match-lookup//' /opt/interproscan/interproscan.properties

Finally! It's really nice seeing that this is not needed any more! 🚀

In tools/InterProScan/Dockerfile https://github.com/emo-bon/MetaGOflow/pull/48#discussion_r1589345612:

-RUN tar -pxzf interproscan-core-$IPRSCAN.tar.gz \

  • -C /opt/interproscan --strip-components=1 && \
  • rm -f interproscan-core-$IPRSCAN.tar.gz interproscan-core-$IPRSCAN.tar.gz.md5 +RUN wget -q -O /opt/interproscan-core-$IPRSCAN.tar.gz ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/$IPR/$IPRSCAN/alt/interproscan-core-$IPRSCAN.tar.gz && \
  • wget -q -O /opt/interproscan-core-$IPRSCAN.tar.gz.md5 ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/$IPR/$IPRSCAN/alt/interproscan-core-$IPRSCAN.tar.gz.md5 && \
  • md5sum -c interproscan-core-$IPRSCAN.tar.gz.md5 && \
  • tar -pxvf interproscan-core-$IPRSCAN.tar.gz -C /opt/interproscan --strip-components=1 && \
  • rm -f interproscan-core-$IPRSCAN.tar.gz interproscan-core-$IPRSCAN.tar.gz +RUN sed -i 's/https:\/\/www.ebi.ac.uk\/interpro\/match-lookup//' /opt/interproscan/interproscan.properties && \
  • sed -i '20i binary.prosite.pfsearchv3.path=${bin.directory}/prosite/pfsearchV3\n' /opt/interproscan/interproscan.properties && \

Is this working globally? Is there any chance you have set an environmental variable because I am not sure whether {bin.directory} will get a value here..

I don't remember doing so.

--


Cymon J. Cox

Senior Researcher Plant Systematics and Bioinformatics Digital Laboratory Centro de Ciencias do Mar (CCMAR) - CIMAR-Lab. Assoc.

Mailing address: CCMAR - Centro de Ciencias do Mar, Universidade do Algarve Campus de Gambelas Edif. 7 8005-139 Faro Portugal

Phone: +351 289800051 ext 7380 Fax: +351 289800051 Email: @.***

@CCMAR https://ccmar.ualg.pt/users/cymon Google Scholar https://scholar.google.co.uk/citations?user=f5M7DhkAAAAJ&hl=en&oi=ao Scopus http://www.scopus.com/inward/authorDetails.url?authorID=7402112716&partnerID=MN8TOARS
Orcid http://orcid.org/0000-0002-4927-979X CienciaVitae

https://www.cienciavitae.pt/6B15-9771-1D04 GPG: Public key on keyserver.ubuntu.com


cymon commented 4 months ago

Closing and reopening for merge into develop.