fanglab / nanodisco

nanodisco: a toolbox for discovering and exploiting multiple types of DNA methylation from individual bacteria and microbiomes using nanopore sequencing.
Other
66 stars 7 forks source link

Installation / R errors? #47

Open phdegnan opened 2 years ago

phdegnan commented 2 years ago

Fang Lab,

Unfortunately the singularity installation option wasn't working for for our campus cluster or my lab machines. As such, I had to set up a conda environment on our cluster and attemptted to leverage available modules with compatible software versions of bwa, R, etc. After fixing file paths in your scripts things started to look like they were working. However, using your test data sets for both E. coli and the metagenome I'm hitting two different R errors:

$ nanodisco characterize -p 4 -b Ecoli -d dataset/EC_difference.RDS -o analysis/Ecoli_motifs -m GATC,CCWGG,GCACNNNNNNGTT -t nn -r reference/Ecoli_K12_MG1655_ATCC47076.fasta [2022-06-13 12:34:39] Load supplied current differences. [2022-06-13 12:34:46] Check current differences file version. [2022-06-13 12:34:46] Determine motif signature center. [2022-06-13 12:34:46] Process GATC. [2022-06-13 12:34:46] Tag GATC occurrences. [2022-06-13 12:34:55] Score GATC modified position. [2022-06-13 12:34:59] Process CCWGG. [2022-06-13 12:34:59] Tag CCWGG occurrences. [2022-06-13 12:35:06] Score CCWGG modified position. [2022-06-13 12:35:09] Process GCACNNNNNNGTT. [2022-06-13 12:35:09] Tag GCACNNNNNNGTT occurrences. [2022-06-13 12:35:15] Score GCACNNNNNNGTT modified position. Error in { : task 1 failed - "Invalid unit" Calls: find.signature.center -> %do% -> Execution halted

It doesn't matter how many MOTIFs input (1,3, all 4 tested). It fails after the last one.

Is there a way to add verbose reporting to R within the context of your code? It seems like the error is stemming somewhere in characterize.R ln 63 referring to analysis_functions.R find.signature.center function on ln 2179.

After this failure, I then tried the metagenome example. The first two commands ran without error. The third errorred out:

$ nanodisco plot_binning -r reference/metagenome.fasta -u analysis/binning/methylation_binning_MGM1_motif.RDS -b MGM1_motif -o analysis/binning -a reference/motif_binning_annotation.RDS --MGEs_file dataset/list_MGE_contigs.txt [2022-06-13 14:23:15] Prepare default metagenome annotation. [2022-06-13 14:23:16] Load additional annotation. [2022-06-13 14:23:17] Plot binning. Error in unit(unclass(x), attr(x, "unit"), attr(x, "data")) : Invalid unit Calls: plot.tsne.motifs.score ... convertUnit -> upgradeUnit -> upgradeUnit.unit -> unit Execution halted

Given your intimate familiarity with your code - any suggestions you have would be most welcome.

My best guess is that is an R package issue? Maybe? Since I was using a system install of R 4.1.2, there were some that I couldn't overwrite/update. What version of R would you recommend if I am installing it fresh within the conda environment?

Regards, Patrick

fanggang commented 2 years ago

Thank you for your interest, Patrick. Alan will help when he gets a chance. Just wanted to add that: one of reasons we put the package in Singularity was to avoid errors caused due to R/package versions, which happened to ourselves. So, for long term, it is likely a good idea to still try to get Singularity setup (not sure why it didn't work on your cluster or lab machine, but likely fixable), because new errors can occur in new R/package versions.

phdegnan commented 2 years ago

Oh, I get the rationale for singularity - code dependencies are a pain in the butt. However, my lab Mac was unable to install Virtualbox (the prereq for singularity) And our campus cluster's sys admin doesn't have time to install it ATM and I don't have time to wait. Looking forward to Alan sharing his insight.

jflopezfernandez commented 1 year ago

Hey, @phdegnan, have you tried installing the specific package versions manually yourself?

Disclaimer: I am not Alan (nor am I affiliated with the project in any way), but I am in the process of using it for a project myself.

You can see the exact versions you need in the post-installation script. At that point, you can try any of the methods mentioned in this StackOverflow post and see if that works.

At the end of the day, as long as you can install the right package versions somehow (meaning you either build them from source or install pre-compiled binary versions), the program should run.[^theory-vs-practice]

It's janky, but this solution not only circumvents the VirtualBox installation problem entirely, the analysis will even execute faster because it's not being run indirectly via a hypervisor.

Hope this helps, Jose

[^theory-vs-practice]: Of course, this is only true in theory. In practice, theory and practice can differ greatly.

touala commented 1 year ago

Hello @phdegnan,

Sorry for the late answer but I think it's worth noting that I was able to install and use singularity with OSX (Mac) during the development of nanodisco. I've used Singularity-Desktop beta version but it seems to be discontinued (here). They also offer a solution using docker compose but I've not tried it (here).

In your situation, with access to a cluster, I would try to install singularity with conda. You can try running this old version:

conda create --name singularity -c conda-forge singularity
conda activate singularity
singularity --version
# singularity version 3.8.6

Although they are known issue (this), they seem to be fixed in latest singularity version. Importantly, singularity development organisation recently changed, and the tool is now maintained under the apptainer name. This should be fully backward compatible. I've successfully run nanodisco commands from your first message with the following installation of apptainer:

conda create --name apptainer -c conda-forge apptainer
conda activate apptainer
apptainer --version
# apptainer version 1.1.5

Lastly, apptainerseems to be usable with MacOS but I didn't test: here.

Best,

Alan