Closed Bontempogianpaolo1 closed 3 years ago
To Dos:
I realized a Nextflow pipeline (fiveProcesses.nf) composed by five simple processes in Python and Perl (I tried also with Java but I had some problems) receiving as input a simple txt file (prova.txt).
I attached the project to GitHub (federicacitarrella/pipelineGeneFusions) using the following commands:
git init
git remote add origin https://github.com/federicacitarrella/pipelineGeneFusions.git
git config --global user.email "federica.citarrella14@gmail.com"
git add <filename> / git rm --cached <filename>
git commit -m '...'
git push [--set-upstream origin master]
Then I integrated the project with Docker. I created two simple Dockerfiles:
Dockerfile (1):
FROM ubuntu
RUN apt-get -y update
RUN apt-get -y install python3
Dockerfile (2):
FROM ubuntu
RUN apt-get -y update
RUN apt-get -y install perl
I realized two docker images using these Dockerfiles:
sudo docker build -t '<image_name>' <path_to_the_directory_of_dockerfile>
I created a public Docker Hub repository to share the images: federicacitarrella/dockertest
Then I pushed the images using the following instructions:
sudo docker login -u <username>
sudo docker tag image_name federicacitarrella/dockertest:image_name
sudo docker push federicacitarrella/dockertest:image_name
To pull the images use the following command:
sudo docker pull federicacitarrella/dockertest:image_name
Then the fiveProcesses.nf file was modified specifying the image to use for each process (multiple container approach) using the following format:
process name {
container 'image_name'
'''
do this
'''
}
Finally to run the pipeline using docker I run the following command:
sudo ./nextflow run fiveProcesses.nf -with-docker federicacitarrella/dockertest:image1 federicacitarrella/dockertest:image2
Very well done! I have just few comments:
Sorry, I made some researches but I didn't get what I should insert in these two bash scripts in this case.
A good practice could be to open a virtual environment (Ubuntu) on local and test pipelines over small datasets