Open mikesha2 opened 2 years ago
Kind of new to this, first you would need an ubuntu server set up and then install docker? Kind of confused on how to get this all installed if your starting from scratch ...
Hi! To explain it simply:
Docker implements a thin layer between the host OS (in this case macOS) and the Linux kernel. This means that once a container is built, you can run it on arbitrary machines, through a Docker container.
This magic only works when the architecture is the same (for example running an x86_64 Linux container on an x86_64 processor). Otherwise, Docker will resort to emulation of x86_64, which is far slower. For Apple Silicon Macs, the CPU architecture is ARM64.
Fortunately, the people over at Canonical spent a lot of effort making an ARM64 version of Ubuntu. Additionally, the people at Docker made an ARM64 version of Docker which runs at native speed on Apple Silicon. The result is this:
That ARM64 Docker container is located at the link I posted above (https://hub.docker.com/r/cms6712/kraken2).
In short: All the end user needs to do is install Docker for Mac, pull the linked Docker container, and download/build a kraken2 database.
You can probably replace the instructions for Mac support with:
docker pull cms6712/kraken2
from TerminalDoes that help?
I think its more clear. But I would need Ubuntu running on my Mac as well no? At present I don't ...
No. The point is that Docker implements the Linux kernel, and the container image is literally Ubuntu with kraken2 installed. You don’t do anything except follow those three steps.
got it working! do I have to have docker open and running every time I want to use kraken? Currently trying to build the database! Its taking forever, also doesn't help that my computer shut down overnight for a software update ... lol
I am having issues when classifying. Is there a way to assign ram? I get this error: Loading database information...classify: Error reading in hash table. I am running an m1 max with 64gb of ram. The database is the 16gb hash file so there should be enough. Any help from a fellow Mac user would be helpful!
I would suggest using a pre-built database, as linked above: https://benlangmead.github.io/aws-indexes/k2
As you can see, some of the databases get quite large (I see one that's 96.3 GB), which is probably meant for classifying on a cluster. I get pretty good results with the databases capped at 16 GB
https://benlangmead.github.io/aws-indexes/k2
The link above is exactly where I downloaded the database from. No issues there. However I get this error when trying to classify. "Loading database information...classify: Error reading in hash table"
Which one did you download? Just checked and mine still works fine with k2_pluspf_16gb_20220607
I downloaded k2_standard_08gb_20220926.
Try a slightly older one.
They should just work:
Do you have paired reads, or two single direction reads?
I'm running the following command:
kraken2 --db path/to/k2_pluspf_16gb_20220607/ file_1.fastq.gz file_2.fastq.gz > outputFile
Its actually nanopore data. I generated the fastq using the new ONT dorado package.
This is what I am running:
% /Users/derekstein/kraken2/kraken2-master/kraken2-dir/kraken2 --db /Users/derekstein/kraken2/kraken2-master/k2_pluspfp_16gb_20220607 /Users/derekstein/vsc_projects/dorado/fastq/test.fastq > outputFile
and I get this error:
Loading database information...classify: Error reading in hash table
Trying to clear up some confusion in the manual.
There's really no need to maintain MacOS compatible builds as it states in the manual.
kraken2 can be built on the arm64v8/ubuntu image (it's in the package registry), which runs at essentially native speed in Docker Desktop for M1 Macs. I suggest removing the entire paragraph about building on Mac, and changing it to "Use Docker Desktop if on M1 Macs, and pull the arm64v8 image."
https://hub.docker.com/r/cms6712/kraken2
For reference, I was doing the stupid thing and pulling the x64 image from staphb/kraken2, and running kraken2 via emulation, which was about 20x slower.
The Dockerfile is also literally only 2 lines