Closed tkosciol closed 7 years ago
65% of errors I'm getting in the pipeline are because of this issue, so it's should be high priority. Remaining 35% are most likely scheduler problems generating I/O conflicts which I need to discuss with Jeff.
example on Barnacle in: /projects/microprot/benchmarking/snakemake_test/MSA_ripe_error
@sjanssen2
I don't like the pre-filtering, because that would change the value of Neff
solved by PR #54
It's not
calculate_Neff
error directly, but given the development cycle for skbio, we need to find a workaround here first.If there are duplicate headers in MSA file (which will happen, because (1) HHsuite trims headers to a number of characters, (2) we'll be running HHblits against 2 databases which may contain duplicate entries) it gives an error:
One workaround would be to get rid of headers whatsoever because we're just interested in the number of sequences anyway. Another option is to prefilter MSA for redundant sequences (if that is the case), before calculating the distance matrix.