dib-lab / 2020-ibd

Analysis of publicly available metagenomic sequencing data from humans with IBD.
BSD 3-Clause "New" or "Revised" License
4 stars 0 forks source link

sourmash version for abundance preservation during signature intersection #1

Open taylorreiter opened 4 years ago

taylorreiter commented 4 years ago

sourmash 2.2.1.dev5+g7468bfe dev_0 <develop>

taylorreiter commented 4 years ago

update sourmash version when bumped on conda, then add snakemake rules for creation of greater_than_one_count_hashes.sig file, and filtering of sigs. See sandbox/et_greater_than_1_count_hashes.ipynb for calculation of greater_than_one_count_hashes.sig for k=31.

mkdir -p sandbox/greater_than_one_filt_sigs
cd sandbox/greater_than_one_filt_sigs
ln -s ../../outputs/sigs/*sig .
for infile in *sig
do
    j=$(basename ${infile} .sig)
    sourmash signature intersect -A ${infile} -k 31 -o ${j}_filt.sig ${infile} greater_than_one_count_hashes.sig
done
taylorreiter commented 4 years ago

next snakefile rules:

  1. run compare on filtered sigs
sourmash compare -k 31 --csv ../filt_comp/filt_comp_all.csv  *filt.sig
  1. convert sig to csv:
mkdir -p sandbox/greater_than_one_filt_sigs_csvs
cd sandbox/greater_than_one_filt_sigs_csvs
ln -s ../greater_than_one_filt_sigs/*filt.sig .
for infile in *sig
do
    j=$(basename $infile .sig)
    python ../../scripts/sig_to_csv.py ${infile} ${j}.csv
done