StaPH-B / docker-builds

:package: :whale: Dockerfiles and documentation on tools for public health bioinformatics
GNU General Public License v3.0
187 stars 119 forks source link

adds pangolin 4.3.1 with new pdata 1.30 #1052

Closed kapsakcj closed 3 weeks ago

kapsakcj commented 3 weeks ago

Changes from previous dockerfile

code diff:

$ diff pangolin/4.3.1-pdata-1.29/Dockerfile pangolin/4.3.1-pdata-1.30/Dockerfile 
1c1
< FROM mambaorg/micromamba:1.5.8 AS app
---
> FROM mambaorg/micromamba:1.5.10 AS app
12c12
< ARG PANGOLIN_DATA_VER="v1.29"
---
> ARG PANGOLIN_DATA_VER="v1.30"
18c18
< LABEL base.image="mambaorg/micromamba:1.5.8"
---
> LABEL base.image="mambaorg/micromamba:1.5.10"
126c126
<  unzip OQ381818.1.zip && rm OQ381818.1.zip && \
---
>  unzip -o OQ381818.1.zip && rm OQ381818.1.zip && \
135c135
< unzip OR177999.1.zip && rm OR177999.1.zip && \
---
> unzip -o OR177999.1.zip && rm OR177999.1.zip && \
144c144
< unzip OR461132.1.zip && rm OR461132.1.zip && \
---
> unzip -o OR461132.1.zip && rm OR461132.1.zip && \
153c153
< unzip OR598183.1.zip && rm OR598183.1.zip && \
---
> unzip -o OR598183.1.zip && rm OR598183.1.zip && \
164c164
< unzip OR716684.1.zip && rm OR716684.1.zip && \
---
> unzip -o OR716684.1.zip && rm OR716684.1.zip && \
174c174
< unzip PP189069.1.zip && rm PP189069.1.zip && \
---
> unzip -o PP189069.1.zip && rm PP189069.1.zip && \
185c185
< unzip PP218754.1.zip && rm PP218754.1.zip && \
---
> unzip -o PP218754.1.zip && rm PP218754.1.zip && \
195c195
< unzip PP770375.1.zip && rm PP770375.1.zip && \
---
> unzip -o PP770375.1.zip && rm PP770375.1.zip && \
204c204
< unzip PQ073669.1.zip && rm PQ073669.1.zip && \
---
> unzip -o PQ073669.1.zip && rm PQ073669.1.zip && \
208c208,217
< column -t -s, PQ073669.1-usher/lineage_report.csv
\ No newline at end of file
---
> column -t -s, PQ073669.1-usher/lineage_report.csv
> 
> # new lineage MC.2 that was introduced in pango-designation v1.30: https://github.com/cov-lineages/pango-designation/commit/c64dbc47fbfbfd7f4da011deeb1a88dd6baa45f1#diff-a121ea4b8cbeb4c0020511b5535bf24489f0223cc83511df7b8209953115d329R2564181
> # genome on NCBI: https://www.ncbi.nlm.nih.gov/nuccore/PQ034842.1
> RUN datasets download virus genome accession PQ034842.1 --filename PQ034842.1.zip && \
> unzip -o PQ034842.1.zip && rm PQ034842.1.zip && \
> mv -v ncbi_dataset/data/genomic.fna PQ034842.1.genomic.fna && \
> rm -vr ncbi_dataset/ README.md && \
> pangolin PQ034842.1.genomic.fna -o PQ034842.1-usher && \
> column -t -s, PQ034842.1-usher/lineage_report.csv

Pull Request (PR) checklist:

kapsakcj commented 3 weeks ago

Does anyone have the bandwidth to review this PR?

erinyoung commented 3 weeks ago

It looks like the tests work.

#19 [test  7/17] RUN datasets download virus genome accession ON924087.1 --filename ON924087.1.zip &&  unzip ON924087.1.zip && rm ON924087.1.zip &&  mv -v ncbi_dataset/data/genomic.fna ON924087.1.genomic.fna &&  rm -vr ncbi_dataset/ README.md &&  pangolin ON924087.1.genomic.fna -o ON924087.1-usher &&  column -t -s, ON924087.1-usher/lineage_report.csv
#19 0.658 Downloading: ON924087.1.zip    847B 11.2MB/s
#19 0.669 Downloading: ON924087.1.zip    847B 11.2MB/s
#19 0.679 Downloading: ON924087.1.zip    847B 11.2MB/s
#19 0.689 Downloading: ON924087.1.zip    847B 11.2MB/s
#19 0.699 Downloading: ON924087.1.zip    847B 11.2MB/s
#19 0.709 Downloading: ON924087.1.zip    847B 11.2MB/s
#19 0.720 Downloading: ON924087.1.zip    847B 11.2MB/s
#19 0.730 Downloading: ON924087.1.zip    847B 11.2MB/s
#19 0.740 Downloading: ON924087.1.zip    847B 11.2MB/s
#19 0.750 Downloading: ON924087.1.zip    847B 11.2MB/s
#19 0.760 Downloading: ON924087.1.zip    847B 11.2MB/s
#19 0.771 Downloading: ON924087.1.zip    847B 11.2MB/s
#19 0.781 Downloading: ON924087.1.zip    847B 11.2MB/s
#19 0.791 Downloading: ON924087.1.zip    847B 11.2MB/s
#19 0.801 Downloading: ON924087.1.zip    847B 11.2MB/s
#19 0.811 Downloading: ON924087.1.zip    847B 11.2MB/s
#19 0.821 Downloading: ON924087.1.zip    847B 11.2MB/s
#19 0.832 Downloading: ON924087.1.zip    847B 11.2MB/s
#19 0.842 Downloading: ON924087.1.zip    847B 11.2MB/s
#19 0.852 Downloading: ON924087.1.zip    847B 11.2MB/s
#19 0.862 Downloading: ON924087.1.zip    847B 11.2MB/s
#19 0.872 Downloading: ON924087.1.zip    847B 11.2MB/s
#19 0.883 Downloading: ON924087.1.zip    847B 11.2MB/s
#19 0.893 Downloading: ON924087.1.zip    847B 11.2MB/s
#19 0.903 Downloading: ON924087.1.zip    847B 11.2MB/s
#19 0.913 Downloading: ON924087.1.zip    847B 11.2MB/s
#19 0.923 Downloading: ON924087.1.zip    847B 11.2MB/s
#19 0.934 Downloading: ON924087.1.zip    847B 11.2MB/s
#19 0.944 Downloading: ON924087.1.zip    847B 11.2MB/s
#19 0.954 Downloading: ON924087.1.zip    847B 11.2MB/s
#19 0.964 Downloading: ON924087.1.zip    2.02kB 6.66kB/s
#19 0.974 Downloading: ON924087.1.zip    2.02kB 6.66kB/s
#19 0.985 Downloading: ON924087.1.zip    2.02kB 6.66kB/s
#19 0.995 Downloading: ON924087.1.zip    2.02kB 6.66kB/s
#19 1.005 Downloading: ON924087.1.zip    2.02kB 6.66kB/s
#19 1.015 Downloading: ON924087.1.zip    2.02kB 6.66kB/s
#19 1.025 Downloading: ON924087.1.zip    2.02kB 6.66kB/s
#19 1.036 Downloading: ON924087.1.zip    2.02kB 6.66kB/s
#19 1.046 Downloading: ON924087.1.zip    2.02kB 6.66kB/s
#19 1.056 Downloading: ON924087.1.zip    2.02kB 6.66kB/s
#19 1.066 Downloading: ON924087.1.zip    2.02kB 6.66kB/s
#29 12.88 ****
#29 12.88 Data files found:
#29 12.88 usher_pb: /opt/conda/envs/pangolin/lib/python3.8/site-packages/pangolin_data/data/lineageTree.pb
#29 12.88 ****
#29 12.88 ****
#29 12.88 Output file written to: /data/PQ034842.1-usher/lineage_report.csv
#29 12.99 taxon                                                                                                           lineage            conflict  ambiguity_score  scorpio_call  scorpio_support      scorpio_conflict  scorpio_notes  version                                                                    pangolin_version  scorpio_version  constellation_version  is_designated  qc_status  qc_notes  note                    
#29 12.99 "PQ034842.1 Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/NY-CDC-LC1108001/2024   complete genome"  MC.2      0.0                            Omicron (BA.2-like)  0.92              0.02           scorpio call: Alt alleles 57; Ref alleles 1; Amb alleles 0; Oth alleles 4  PANGO-v1.30       4.3.1            0.3.19                 v0.1.12        True       pass      Ambiguous_content:0.02  Assigned from designation hash.
#29 DONE 13.0s
erinyoung commented 3 weeks ago

Thank you for putting this together!

I'm going to deploy this to dockerhub and quay to staphb/pangolin using the tags '4.3.1-pdata-1.30' and 'latest'