MaestSi / MetONTIIME

A Meta-barcoding pipeline for analysing ONT data in QIIME2 framework
GNU General Public License v3.0
78 stars 17 forks source link

Assistance with MetONTIIME #74

Closed annegilewski closed 1 year ago

annegilewski commented 1 year ago

Good afternoon, Dr. Maestri.

My apologies if this isn't the appropriate topic for this space. I am attempting to set up MetONTIIME (MOT) to analyze 16S fastq barcoded files that were sequenced on the MinION M1kc unit (with guppy and all that). I have six pools of data each with a variable number of barcodes containing multiple fastq files. The MinION batches in reads of 4000 (or ~7MB). Consequently, I have a lot of data in big files. I don't have a complete metadata file created as of yet, but am working on that.

Once I got Nextflow, MOT and Docker situated, I ran the usage code indicating where the input/output directories are located. The MOT files are in my $PATH in usr/local/bin along with Nextflow per instructions.

I'm sharing a link to a .txt file that shows where the CPU issue popped up. I saw in the forum that the metontiime2.conf file can be adjusted, but I was unable to see/edit the text using nano or vim command in Terminal.

I am running the program on a MacBook and iMac which I understand may be untested waters. My colleague ran this program on AWS, so I was thinking that may be helpful to use?

Additionally, and this is my own ignorance, there are several posts that mention other scripts (#20 in particular) that is referenced in the README. I read through it and don't see Launch_MinION_mobile_lab.sh or Evaluate_diversity.sh. I am absolutely missing some salient points in the startup and was wondering if you have insight into my approach or if MOT may not be appropriate?

Thank you, Anne

https://unhnewhaven-my.sharepoint.com/:w:/g/personal/agile2_unh_newhaven_edu/Ec1qkuNvG_dEv2IBxwu8W_8B9qXJ0b1XkQEqKBi1BqOH2g?e=TZbd1I

MaestSi commented 1 year ago

Hi Anne, the other scripts you see mentioned in older issues are related to v1 of MetONTIIME pipeline (still available in "v1" branch of this repository, but deprecated). Based on the logs you provided me with, I would suggest the following: first, always use full paths (for --resultsDir and for --workDir, for example). Second, you should be able to open metontiime2.conf file with any text editor, and adjust the options accordingly. See this issue for advice on how to do that. In particular, the pipeline is complaining that in importDb process you are requesting 6 CPUs, but your Mac only has 4. So, for each process, you should request up to 4 CPUs. You can specify parameters either in the metontiime2.conf file or at runtime with --parameterName. In case you do both for a specific parameter, the --parameterName value will overwrite the value in the conf file. As a last point, I would suggest to mount the relevant directories, so that Docker can access them. Let's say your data are in /Users/name directory, then line 171 of metontiime2.conf script should contain:

containerOptions = '-v /Users/name/:/Users/name'

Let me know if after these adjustments you are able to run the pipeline. I would suggest starting with the toy dataset available in this repository for pipeline set-up and, in case you are able to run it, try with your dataset. Best, SM

annegilewski commented 1 year ago

Thank you so much for your guidance. I think I get the gist of what I need to do (sorry, I’m very new to command line analysis outside of R). I’ll take a look tomorrow and will follow up. Do you prefer communications through GitHub so that it is archived for other users? Best,Anne Anne Gilewski, @. Oct 28, 2023, at 00:22, Simone Maestri @.> wrote: Hi Anne, the other scripts you see mentioned in older issues are related to v1 of MetONTIIME pipeline (still available in "v1" branch of this repository, but deprecated). Based on the logs you provided me with, I would suggest the following: first, always use full paths (for --resultsDir and for --workDir, for example). Second, you should be able to open metontiime2.conf file with any text editor, and adjust the options accordingly. See this issue for advice on how to do that. In particular, the pipeline is complaining that in importDb process you are requesting 6 CPUs, but your Mac only has 4. So, for each process, you should request up to 4 CPUs. You can specify parameters either in the metontiime2.conf file or at runtime with --parameterName. In case you do both for a specific parameter, the --parameterName value will overwrite the value in the conf file. As a last point, I would suggest to mount the relevant directories, so that Docker can access them. Let's say your data are in /Users/name directory, then line 171 of metontiime2.conf script should contain: containerOptions = '-v /Users/name/:/Users/name'

Let me know if after these adjustments you are able to run the pipeline. I would suggest starting with the toy dataset available in this repository for pipeline set-up and, in case you are able to run it, try with your dataset. Best, SM

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

MaestSi commented 1 year ago

Yes, it’s better if you write here on GitHub. SM

annegilewski commented 1 year ago

I was able to change the CPUs in each process, fix the directories so that the pipeline showed up in Docker! Big win in that column. Unfortunately, the importDB command stopped due to computer memory (10GB needed and I have 8 on my laptop). I saw on a forum that it's best not to adjust those parameters in the config file? I was going to try on a desktop Mac and see if I can get it to run with a data pool.

MaestSi commented 1 year ago

Hi, I think you should try decreasing the RAM requirements too, so that they are compatible with available resources. However, I think the pipeline may work on a laptop just with a small database and dataset. You may try to run a test with the toy dataset and with 16S NCBI Bacterial db. SM

annegilewski commented 1 year ago

Good note. My data pools are huge, so I guess I would have to split the pools up. Would I then load the output files in the QIIME viewer? We are hoping to have each pool analyzed separately so we can compare--my project is looking at microbial biofilm communities on microplastic pellets versus stone substrate.

I'll try the toy data set as well. At this point, I'm still trying to understand what I'm doing! On another note, I missed that I have to assign a database at the run. So, here I could use the script from EPI2ME as referenced in the ReadMe?

MaestSi commented 1 year ago

Yes, you could go for that one. I suggest to use the laptop for learning how to use the pipeline, and then to scale up on the more powerful desktop. Best, SM

annegilewski commented 1 year ago

Sorry--one more comment--I got the Bio33175 fasta file and changed the code below. I'm unclear on the taxonomy file. Do I need to create a file from NCBI for that or does the code below fetch that. And also, how does R fit in? I'm so sorry--thanks for all of your help.

This R script requires an R installation with taxize and Biostrings packages installed. For example, if you want to use the same database used by the EPI2ME 16S workflow for bacterial 16S gene, you can go to BioProject 33175, click send to, select Complete Record and File, set the Format to FASTA and then click Create File; the corresponding taxonomyTsv file can then be created with:

Rscript /path/to/TaxonomyTsv_from_fastaNCBI.R \ dbSequencesFasta="/downloads/Sequence.fasta" \ dbTaxonomyTsv="./path/to/output//dbTaxonomy.tsv" \ ENTREZ_KEY="myentrezkey"

MaestSi commented 1 year ago

The code below generates the taxonomy tsv file. What you need to have is R with the required packages installed. SM

annegilewski commented 1 year ago

Oh okay, so I plug that code into R with packages--complete with "R script" too?--fix the path name and it should show up. THEN I put that path into the 'params' section of the configuration code?

MaestSi commented 1 year ago

You can run the Rscript from the terminal. It will execute the R executable in the same environment (if you are using R installed with conda, for example). Then, as you said, you will have to put that path in the config file. SM

annegilewski commented 1 year ago

Hello again! I was able to get the .tsv file created. I'm now stuck on mounting to Docker. I shared the file via docker to the .MetONTIIMe file on my laptop, but I am getting an error message from Terminal that it can't be found. I saw that every run instance gets a new file in /users/path/work/uniqueID, so I can't point to that file. The image is loaded into Docker. Thoughts?

MaestSi commented 1 year ago

Hi, I do not understand very well what kind of issue you are having. What could not be found? If the answer is "some files in your laptop", then did you remember to mount the relevant directories, so that Docker can access them? Please have a look at this issue. SM

annegilewski commented 1 year ago

Gotcha--that helps out. We can close this issue and I'll create a new one if I run into something else. Thank you so much!

MaestSi commented 1 year ago

Great! Have a nice day. SM