GregorySchwartz / too-many-cells

Cluster single cells and analyze cell clade relationships with colorful visualizations.
https://gregoryschwartz.github.io/too-many-cells/
GNU General Public License v3.0
105 stars 19 forks source link

Prebuilt binaries for Linux #14

Open mcfefa opened 4 years ago

mcfefa commented 4 years ago

Hi @GregorySchwartz, I am trying to run too-many-cells on a linux cluster where I don't have sudo access and docker is not a viable installation options. Can I use prebuilt linux binaries of too-many-cells? I have not been able to use the docker approach or compile with stack on my OSX laptop either.

chris-rands commented 4 years ago

Docker works for me, but instructions to run on singularity would be nice (avoiding the need for sudo), i got this far...

singularity pull docker://gregoryschwartz/too-many-cells:0.2.2.0
singularity build too-many-cells docker://gregoryschwartz/too-many-cells:0.2.2.0
singularity run too-many-cells make-tree -h

However, when I try and run with my actual data, get an R related error:

cannot find system Renviron
Fatal error: unable to open the base package
GregorySchwartz commented 4 years ago

@chris-rands I'm not familiar with singularity, but if you can get added to the docker group then that would bypass the need for sudo (after being added to the group, that is). For the error, you may need to edit the Dockerfile to point R_HOME to where R is located on the image, possibly.

GregorySchwartz commented 4 years ago

@mcfefa I considered making a prebuilt binary, but I'm not sure how it would link in with R for some of the downstream processes. What issues do you have compiling?

GregorySchwartz commented 4 years ago

@mcfefa @chris-rands I have package too-many-cells for nix, I recommend trying that out (see the documentation). It's a reproducible derivation which should take care of all dependencies and only requires root once when installing nix.

ccruizm commented 4 years ago

Good day!

I have been trying to run the package using singularity. I could build the container and I am able to start running the pipeline. I start an interactive session in our HPC srun -n 12 --mem 128G --time 12:00:00 --gres=tmpspace:100G --pty bash

Then I run the singularity command for the .sif container singularity run too-many-cells_0.2.2.0.sif make-tree --matrix-path ./cells.csv --labels-file ./labels.csv --draw-collection "PieRing" --output ./out > clusters.csv

It starts running with no issues. However, when reaching one of the steps, it stops Sketching tree [=======================>..................................] 40% '/scratch/1448065: openTempFile: does not exist (No such file or directory). After the error, I typed   'df -h $TMPDIR'

Filesystem                  Size        Used    Avail   Use%    Mounted on
/dev/mapper/vg_tmp-1448065  100G     33M    100G    1%      /scratch/1448065

What do you think the problem is? The dir is there but somehow it does not recognise it. I have tried:

SINGULARITY_LOCALCACHEDIR=$TMPDIR
SINGULARITY_CACHEDIR=$TMPDIR
SINGULARITY_TMPDIR=$TMPDIR
export SINGULARITY_LOCALCACHEDIR
export SINGULARITY_CACHEDIR
export SINGULARITY_TMPDIR

and also have not worked. I have use the -W option for singularity run that should force the link to a temporary directory, but still get the same error.

I do not know what else to do. We are only allowed to use singularity in our HPC. so neither docker nor nix is a viable option.

Thanks for the help

GregorySchwartz commented 4 years ago

I have no experience with singularity so I might not be much help. However, is it a permissions issue? Can you change the $TMPDIR to something you own, maybe in the same directory as the cells.csv file?

ccruizm commented 4 years ago

Unfortunately, that does not seem to be the problem. I own the dir created in /scratch/ and even after giveing all the permission, still fails.

GregorySchwartz commented 4 years ago

What about using the same directory as the cells.csv file?

ccruizm commented 4 years ago

Finally, I found a way to make it work. I bound the PATH for the singularity container and now it finds the TMP dir.

One quick question. The file containing labels (e.g. cell type, sample, etc.) can only contain one piece of metadata? or can I add several columns with different info? if so, how should I name each column? I have checked the options for make-tree but doesn't seem to have one to specify the column to be used for the labels.csv file.

Thanks in advance

GregorySchwartz commented 4 years ago

@ccruizm You would have one file for each metadata under the label column, putting that specific file in as the labels to color. The format is item,label for the barcode (item) and metadata (label) columns.

DiracZhu1998 commented 2 years ago

@GregorySchwartz Dear Gregory Schwartz, Thank u for giving us such a great tool kit. I have the same problem that I can't have sudo access and my own laptop doesn't have enough memory run locally. If you could create Conda environment yml and related TooManyCells, I believe it will be more robust and universal.

GregorySchwartz commented 2 years ago

@DiracZhu1998 Have you tried Singularity? I have had no issues using Singularity with the Docker image, and that should not require sudo (and, unlike nix, is usually installed on HPCs).

DiracZhu1998 commented 2 years ago

Yeah, I can run it properly. Thanks! I also agree TooManyCells have great potential to deal with several problems in the traditional scRNA-seq workflow: 1). Rare cell types and Major cell types identification at the same time 2). cell Clade inheritance 3). tSNE/UMAP Projection Problem (Dimension reduction for visualization) makes high-dim dots overlapping in low-dims inevitable I have successfully run too-many-cells with "batch-corrected matrix (Seurat integrated matrix)" as input and get a reasonable clumpiness measure plot. I'm still testing other parameters to make tree more easy-to-read and meaningful.

Screen Shot 2022-04-20 at 16 04 50