Christy G. asked me to run MAGsearch for her, and I thought I'd document it this time!
first, sketch the genomes.
I grabbed all of her genomes and then ran:
sourmash sketch dna -p k=31,scaled=1000 *
in the directory containing the FASTA files.
I then put them in a zip file:
zip -r christy-2022.09.25.zip *.sig
and transferred them to farm (our HPC).
2. unpack the sketches and generate a list
On farm, I went to my MAGsearch directory:
cd ~ctbrown/scratch/magsearch
mkdir query.christy-2022.09.25
and unzipped the sketches:
unzip ~/transfer/christy-2022.09.25.zip
and made a list of the files relative to the base MAGsearch directory:
ls -1 query.christy-2022.09.25/* > query.christy-2022.09.25.txt
3. make a configuration file
I made a new copy of the config file:
cp config.yml config-christy-2022.09.25.yml
and then added the search-specific things:
# unique query name
query_name: christy-2022.09.25
# list of paths of query signatures - 1 or more.
query_sigs: query.christy-2022.09.25.txt
# catalog to search - list of paths of subject signatures
#catalog: /group/ctbrowngrp/sra_search/catalogs/metagenomes
catalog: catalog.sub
# containment threshold to use
threshold: 0.01
# k-mer size to use
ksize: 31
# scaled to use
scaled: 1000
# where to put the results
out_dir: "output.magsearch"
note that this is a test because I'm only searching a small catalog, catalog.sub - this makes sure the queries etc can all be loaded before we run the thing for a day or two!
5. check logs for test
It looks like all went well:
% cat output.magsearch/logs/sra_search.k31.log
[2022-09-25T12:56:54Z INFO sra_search] Loading queries
[2022-09-25T12:56:54Z INFO sra_search] Loaded 27 query signatures
[2022-09-25T12:56:54Z INFO sra_search] Loading siglist
[2022-09-25T12:56:54Z INFO sra_search] Loaded 14 sig paths in siglist
[2022-09-25T12:56:54Z INFO sra_search] Processed 0 search sigs
(the last line is output only every so often, so more than 0 search sigs were processed.)
editing here: https://hackmd.io/EQG9YLZwQGOeoKWjy-fHFg
Running MAGsearch for Christy
Christy G. asked me to run MAGsearch for her, and I thought I'd document it this time!
first, sketch the genomes.
I grabbed all of her genomes and then ran:
in the directory containing the FASTA files.
I then put them in a zip file:
and transferred them to farm (our HPC).
2. unpack the sketches and generate a list
On farm, I went to my MAGsearch directory:
and unzipped the sketches:
and made a list of the files relative to the base MAGsearch directory:
3. make a configuration file
I made a new copy of the config file:
and then added the search-specific things:
4. start an srun session
Next I started screen and ran a beefy srun:
and ran a test:
note that this is a test because I'm only searching a small catalog,
catalog.sub
- this makes sure the queries etc can all be loaded before we run the thing for a day or two!5. check logs for test
It looks like all went well:
(the last line is output only every so often, so more than 0 search sigs were processed.)
6. run for realz
Remove test output,
edit the config file like so:
and run!