BDI-pathogens / phyloscanner

Phylogenetics between and within hosts at once, all along the genome.
GNU General Public License v3.0
44 stars 14 forks source link

Problems with blacklisting #24

Closed MarcNiebel closed 6 years ago

MarcNiebel commented 6 years ago

I am trying to implement the analysis part of phyloscanner using the duplication data for blacklisting with my command being: phyloscanner_analyse_trees.R third_iteration/trees_thirditeration/FastTree RunWithBlacklistingrwt2 s,20 --duplicateBlacklist third_iteration/DuplicateReadCountsProcessedInWindow --rawBlacklistThreshold 2 --outgroupName 1|a|AF009606 --tipRegex "^(.*)_([0-9]+)read([0-9]+)count([0-9]+)$" --normRefFileName branch_length_normalisation.csv_ByPosition.csv --multifurcationThreshold g --verbose

The error reads: Error in phyloscanner.analyse.trees(tree.directory, tree.file.regex, reconstruction.mode, : Tree files identifiers and duplicate file identifiers do not match at all. Check file prefixes are correct. Execution halted

My tree files names have the structure FastTree_window coordinates.tree My duplicate reads have the format as above (DuplicateReadCountsProcessed_InWindow_window coordinates.csv). I have tried to change both names of the files with no success.

Any help would be much appreciated.

Marc

ChrisHIV commented 6 years ago

Hi Marc, if you subtract what you specified for your tree file regex from an example tree file name, does it match what you get if you subtracted what you specify for you duplicate file regex from the corresponding duplicate file? If not, does tweaking the regex to make that so fix it? If not, can you give an explicit example of one tree file name and one duplicate file name and I'll try to massage the regex...

mdhall272 commented 6 years ago

(Also note that the file extension does not matter.)

MarcNiebel commented 6 years ago

Hi Chris and Mathew,

Thanks for that. Had not specified filename regex which after a little bit of tinkering got sorted.

Marc