czbiohub-sf / nf-predictorthologs

*de novo* orthologous gene predictions from bam + bed or fasta/fastq data
MIT License
4 stars 2 forks source link

Differential hash expression #26

Closed olgabot closed 4 years ago

olgabot commented 4 years ago

Given a csv of metadata about samples, do a groupby on the csv, get hashes enriched in certain groups using logistic regression, then send them down the line to hash2kmer and diamond blastp to figure out what they are.

PR checklist

Learn more about contributing: CONTRIBUTING.md

olgabot commented 4 years ago

FYI getting exit code 139 when using this on data "in the wild:" https://groups.google.com/forum/?nomobile=true#!topic/nextflow/L9r_cZYf5lY

github gist of errors: https://gist.github.com/olgabot/e64b597b0eb6df3996cb196f0b41a13d

olgabot commented 4 years ago

Turns out it was a ulimit error: https://unix.stackexchange.com/questions/85457/how-to-circumvent-too-many-open-files-in-debian/85458#85458

Did:

ulimit -Sn 99999
sudo sh -c "ulimit -Hn 99999 ; exec su \"$USER\""

And decreased the input csv to contain only ~2000 signatures/samples which solved the problem