KwanLab / Autometa

Autometa: Automated Extraction of Genomes from Shotgun Metagenomes
https://autometa.readthedocs.io
Other
40 stars 15 forks source link

Fix/Create mock data subworkflow #206

Closed chasemc closed 2 years ago

chasemc commented 2 years ago

Part of but does-not-fix:

mention https://github.com/KwanLab/Autometa/issues/152

chasemc commented 2 years ago

PR allows creating mock contigs from an input set of Genbank assembly accessions. Also creates two minimal reports at end showing the binning results- one colored by genus (parsed from name) and one by assembly accession.

Example output: mock_data_reports.zip

chasemc commented 2 years ago

Notes: Mock data reports should write out to the main output folder.

To run the pipeline with mock data set the parameter --mock_test true

chasemc commented 2 years ago

A couple of things to fix (or not) before merging in (@WiscEvan I don't think I'll have time to do these today)

1) This process needs a docker image. @ajlail98 could maybe look around to find one? https://github.com/KwanLab/Autometa/blob/0ec00874d11238ae400fb1f3d72212dd38aa717f/modules/local/get_genomes_for_mock.nf#L10

2) This one also: https://github.com/KwanLab/Autometa/blob/b73850e9c37ee49572c05af504040fd57d382a7d/modules/local/mock_data_reporter.nf#L14-L15 I have an example there but it has to be built first. Maybe that's okay if the mock_data is only going to be used by developers, where instructions to build the image first could be provided Note: that dockerfile is a modified version of: https://github.com/rocker-org/rocker/blob/master/r-rmd/Dockerfile where procps is also installed (required by Nextflow), so the Rocker project license would have to be included

3) Last- just a note that when I happened to run this with "GCF_013307045.1" it failed because of no markers found. May be worth looking into

evanroyrees commented 2 years ago

:memo: I've added a tag to the GET_GENOMES_FOR_MOCK process in get_genomes_for_mock.nf so the user can easily tell how many genomes are being fetched for the mock community.

Runtime Note

nextflow run . -profile docker -params-file "nf-params.json" --mock_test true --input .

nf-params.json

{
    "autometa_image_tag": "dev"
}

Dockerfiles

I've also added dockerfiles for the processes you've mentioned. I was not sure where to put these. I've opted to place them in a $HOME/Autometa/docker/modules sub-directory. If you have guidance on where these should be placed, feel free to move them.. If you make these changes, the Makefile command modules-images will need to be updated to conform to these updated paths.

i.e. to build all autometa nextflow modules docker images from Makefile

make modules-images

A couple of things to fix (or not) before merging in

  1. This process needs a docker image. modules/local/get_genomes_for_mock.nf)
  1. This one also: https://github.com/KwanLab/Autometa/blob/b73850e9c37ee49572c05af504040fd57d382a7d/modules/local/mock_data_reporter.nf#L14-L15

    I have an example there but it has to be built first. Maybe that's okay if the mock_data is only going to be used by developers, where instructions to build the image first could be provided Note: that dockerfile is a modified version of: https://github.com/rocker-org/rocker/blob/master/r-rmd/Dockerfile where procps is also installed (required by Nextflow), so the Rocker project license would have to be included

I've used conda to create the R env