Bioconductor / workshop-contributions

Repository for managing contributions to workshop.bioconductor.org
5 stars 0 forks source link

[Bioc2024] Bioconductor Workshop for BiocPy #89

Open jkanche opened 1 month ago

jkanche commented 1 month ago

Hi, I am presenting a workshop next week on BiocPy:interoperability between R and Python.

Most of the content is in Python, so folks will be following along using Jupyter that already contains all the necessary packages. Do I provide a docker image with Jupyter notebook and the packages preinstalled? How do i do this?

I currently use quarto to publish the tutorial website and is hosted here: https://github.com/BiocPy/BiocWorkshop2024

almahmoud commented 1 month ago

We've only ever done RStudio containers in the past, but happy to do the manual work to support jupyter. Please let me know the container, port at which jupyter is exposed, and name/description for your workshop, and I'll do my best.

I'll do my best to get it up by tomorrow, but given the tight deadline and conference starting in a few days, if not done by tomorrow, I'd encourage you also plan for a backup such as Colab, but let's try using the platform first.

jkanche commented 1 month ago

Thank you @almahmoud, I would like to use https://jupyter-docker-stacks.readthedocs.io/en/latest/ and it runs on port 8888

almahmoud commented 1 month ago

Hey @jkanche, you pointed to the general jupyter docker docs, not to a specific container. Please let me know which of the many jupyter containers you'd want to use, or better yet, if you can create a custom container with your packages etc pre-installed on top of that general jupyter container, that'd be even better. If you don't have experience using Docker and/or don't need a specific container just any Jupyter environment, please let me know what Pypi/conda/R packages you need and I'll try to build it on my side for you. Also, do you need a jupyter container with both R and python kernel, or just python kernel?

jkanche commented 1 month ago

Hi @almahmoud , thank you so much for helping me out here. I tried to use the bioconductor_docker:devel to create an image but ran into many issues. It would be super helpful to create one with both the python and R kernel. I have the packages listed in the workshop repository:

python dependencies: https://github.com/BiocPy/BiocWorkshop2024/blob/master/requirements.txt R dependecies: https://github.com/BiocPy/BiocWorkshop2024/blob/master/rpackages.R

If having both R and Python is too much trouble, just a simple jupyter image with the Python packages installed would also be very helpful. I would really appreciate any help here.

jkanche commented 1 month ago

quick update, I was able to publish an image containing the notebook and the relevant python packages to github registry: https://github.com/BiocPy/BiocWorkshop2024/pkgs/container/biocworkshop2024%2Fbuilder and the dockerfile used for the build - https://github.com/BiocPy/BiocWorkshop2024/blob/master/Dockerfile

Jupyter notebook runs on post 8889. It has tokens, do you know if there's a way to disable token based authentication?

almahmoud commented 1 month ago

Hey @jkanche I actually made a container for you already, and it's not deployed to the instance at workshop.bioconductor.org . Please try it out and let me know if it works

jkanche commented 1 month ago

Awesome, thank you so much. I am on a password screen, do you know what the default password is? image

almahmoud commented 1 month ago

Sorry about that, the startup command didn't take effect as expected the first time, try again now, there should be no password, and you should have both R and python kernels. Here is my simple Dockerfile:

FROM jupyter/r-notebook:r-4.3.1
USER root
RUN apt update -qq && apt install python3-dev build-essential -y && curl -O https://raw.githubusercontent.com/Bioconductor/bioconductor_docker/devel/bioc_scripts/install_bioc_sysdeps.sh && bash install_bioc_sysdeps.sh 3.18 && pip install -r <(curl -s https://raw.githubusercontent.com/BiocPy/BiocWorkshop2024/master/requirements.txt) && curl -s https://raw.githubusercontent.com/BiocPy/BiocWorkshop2024/master/rpackages.R | Rscript -

I used the latest available R notebook to make the jupyter setup easiest, but that means you have to use Bioc 3.18 and R 4.3.1. Lmk if that's an issue I can try to make an updated container

jkanche commented 1 month ago

@almahmoud thank you so much. I had a couple of issues during this session 1) having file permissions issues when packages download something and 2) sqlite version shipped in the container is too old.

(2) can be fixed by

# Download and build SQLite3 from source
RUN wget --no-check-certificate https://www.sqlite.org/2024/sqlite-autoconf-3450300.tar.gz && \
    tar -xvf sqlite-autoconf-3450300.tar.gz && \
    cd sqlite-autoconf-3450300 && \
    ./configure && \
    make && \
    make install && \
    export PATH="/usr/local/lib:$PATH" && \
    cd .. && \
    rm -rf sqlite-autoconf-3450300.tar.gz sqlite-autoconf-3450300

# Set environment variable for LD_LIBRARY_PATH
ENV LD_LIBRARY_PATH=/usr/local/lib

do you know whats causing (1)?

almahmoud commented 1 month ago

I can modify container, thank you for providing the commands! Re 1) Are you writing to /home/jovyan ? I believe the default working directory might have permission issues as I didn't account for the user in the jupyter container on the NFS, but you shouldn't need that anyway. If you also can't write to /home/jovyan/ lmk and if you can provide a reproducible example that'd be really helpful too

jkanche commented 1 month ago

I was running this chunk from the container @ notebook/genomic_ranges.ipynb, which downloads the bed file to the current working directory.

from geniml.bbclient import BBClient

bbclient = BBClient(cache_folder="cache", bedbase_api="https://api.bedbase.org")
bedfile_id = "be4054acf6e3feeb4dc490e6430e358e" 
bedfile = bbclient.load_bed(bedfile_id)
peaks = bedfile.to_granges()

filter_chr22 = [x == "chr22" for x in peaks.get_seqnames()]
peaks_chr22 = peaks[filter_chr22]

print(peaks_chr22)
almahmoud commented 1 month ago

@jkanche Thanks for the details! That was my bad, I forgot to chown the git directory since it's being cloned as root at startup. It should be fixed now, and container updated! Let me know if you encounter any other issues!

jkanche commented 1 month ago

@almahmoud Thank you, this resolves the directory issue. Is there any way we can update the sqlite version in the container. It needs a newer version that the one available through the distros - https://github.com/Bioconductor/workshop-contributions/issues/89#issuecomment-2240163149

almahmoud commented 1 month ago

Hey @jkanche, are you not seeing the updated sqlite version? I ran your command from above and updated the container already.

jkanche commented 1 month ago

The notebook says the sqlite version is 3.43 instead of 3.45. I'm checking to see if there's another env variable i should be setting

image

almahmoud commented 1 month ago

I'm currently running the command as root, when you tried that installation command did you run as jovyan within the container or also ran as root?

jkanche commented 1 month ago

seems like the notebooks are run as jovyan

image

jkanche commented 1 month ago

if you are testing this, running section 1.1 from annotate cell types notebook, should give you the list of datasets.

Right now its an error, had the same issue before so i know its sqlite version image

almahmoud commented 1 month ago

I had not tested anything, simply added your sqlite upgrade suggestion, assuming you had tested that and seen it work. I have now added conda update -y -c conda-forge libsqlite instead which actually updates the version of sqlite you see in python. Try it out now, looking quickly in the container, I see:

>>> import sqlite3
>>> sqlite3.sqlite_version
'3.46.0'

Trying it out in the notebook, seems to work

image
jkanche commented 1 month ago

awesome! thank you very much!

jkanche commented 1 month ago

Hi @almahmoud, I am trying to build a docker image and register both R and Python kernels. Does the version you published to workshop.bioconductor.org look something like this ?

https://github.com/BiocPy/BiocWorkshop2024/blob/master/Dockerfile.bioc