Closed magosil86 closed 5 years ago
I don't think that command line can be right? --scriptDir
as an argument hasn't worked since before phyloscannerR existed!
Nevertheless, I've fixed summary_statistics.R. Those scripts in /tools/ really are held together with sellotape at this point, however, and this experience has convinced me they should go ASAP.
Many thanks for the fix! Before you do away with the old scripts, quick question, the ptyr5_patStatsFull.csv produced by summary_statistics.R does not include file.suffix and id fields which are required by TransmissionSummary.R is there a possible work around or fix for this?
Could you attach an example of that file? And do you definitely mean TransmissionSummary.R not transmission_summary.R?
Yeah, I wondered too why the Phyloscanner.R.utilities workflow specified TransmissionSummary.R rather than transmission_summary.R as the former is depracated, here is the error and I have attached the offending file pty_19-04-28-19-03-59.zip
Rscript ~/phyloscanner-master/deprecated/TransmissionSummary.R pty_19-04-28-19-03-59/ptyr5_patients.txt pty_19-04-28-19-03-59/ptyr5_classification_InWindow_ pty_19-04-28-19-03-59/ptyr5_patStatsFull.csv --minThreshold 1 --detailedOutput pty_19-04-28-19-03-59/ptyr5_trmStatsPerWindow.rda --allowMultiTrans --verbose
Getting window counts per patient from pty_19-04-28-19-03-59/ptyr5_patStatsFull.csv...
Error: The following required columns are missing from the summary csv file: file.suffix, id. Quitting.
Execution halted
That's just because it's old. We're working on it!
If I remember rightly it should work if you change the column names; host.id to id and tree.id to file.suffix.
Ah, I see. Renaming the host.id and tree.id fields helps but it now reports:
Error in which(path.classification == "anc") :
object 'path.classification' not found
Calls: lapply ... FUN -> set -> [ -> [.data.table -> eval -> eval -> which
Execution halted
Could you see what happens if you use transmission_summary.R instead? That at least should be current and working.
Sure, here is the output
Rscript ~/phyloscanner-master/tools/transmission_summary.R pty_19-04-28-19-03-59/ptyr5_classification_InWindow_ pty_19-04-28-19-03-59/ptyr5_trmStats.csv --minThreshold 1 --allowMultiTrans --verbose
Error: All columns in a tibble must be 1d or 2d objects:
* Column `tree.id` is NULL
Backtrace:
█
1. └─phyloscannerR::summarise.classifications(...)
2. └─phyloscannerR::merge.classifications(ptrees, allow.mt, verbose)
3. └─`%>%`(...)
4. ├─base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
5. └─base::eval(quote(`_fseq`(`_lhs`)), env, env)
6. └─base::eval(quote(`_fseq`(`_lhs`)), env, env)
7. └─phyloscannerR:::`_fseq`(`_lhs`)
8. └─magrittr::freduce(value, `_function_list`)
9. ├─base::withVisible(function_list[[k]](value))
10. └─function_list[[k]](value)
11. └─purrr::map(...)
12. └─phyloscannerR:::.f(.x[[i]], ...)
13. └─tt %>% add_column(
Execution halted
Was that with the renamed columns? Un-rename them if so.
It was with the original names: host id and tree.id, that said, I just tried it the other way around with id and file.suffix and it gives the same error:
Error: All columns in a tibble must be 1d or 2d objects:
* Column `tree.id` is NULL
Backtrace:
█
1. └─phyloscannerR::summarise.classifications(...)
2. └─phyloscannerR::merge.classifications(ptrees, allow.mt, verbose)
3. └─`%>%`(...)
4. ├─base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
5. └─base::eval(quote(`_fseq`(`_lhs`)), env, env)
6. └─base::eval(quote(`_fseq`(`_lhs`)), env, env)
7. └─phyloscannerR:::`_fseq`(`_lhs`)
8. └─magrittr::freduce(value, `_function_list`)
9. ├─base::withVisible(function_list[[k]](value))
10. └─function_list[[k]](value)
11. └─purrr::map(...)
12. └─phyloscannerR:::.f(.x[[i]], ...)
13. └─tt %>% add_column(
Execution halted
The classification file ptyr5_classification_InWindow_1525_to_1774.csv is empty. What are the earlier commands you used?
Here are the earlier commands as outlined in Phyloscanner.R.utilities:
Rscript ~/phyloscanner-master/tools/parsimony_based_blacklister.R 20 0 20 "pty_19-04-28-19-03-59/ptyr5_InWindow_" "pty_19-04-28-19-03-59/ptyr5_blacklistsank_InWindow" --dualsOutputFile "pty_19-04-28-19-03-59/ptyr5_duallistsank_InWindow" --outgroupName C.BW.00.00BW07621.AF443088 --tipRegex "^(.*)-fq[0-9]+_read_([0-9]+)_count_([0-9]+)$" --multifurcationThreshold 1e-05 --branchLengthNormalisation "pty_19-04-28-19-03-59/ptyr5_normconst.csv" --verbose
Rscript ~/phyloscanner-master/tools/downsample_reads.R 50 pty_19-04-28-19-03-59/ptyr5_ pty_19-04-28-19-03-59/ptyr5_blacklistdwns_ --blacklist pty_19-04-28-19-03-59/ptyr5_blacklistsank_ --tipRegex "^(.*)-fq[0-9]+_read_([0-9]+)_count_([0-9]+)$" --seed 42 --verbose
Rscript ~/phyloscanner-master/tools/split_hosts_to_subgraphs.R "pty_19-04-28-19-03-59/ptyr5_" "ptyr5" --blacklist "pty_19-04-28-19-03-59/ptyr5_blacklistdwns_" --outputdir "pty_19-04-28-19-03-59" --idFile "pty_19-04-28-19-03-59/ptyr5_patients.txt" --outgroupName C.BW.00.00BW07621.AF443088 --splitsRule s --kParam 20 --proximityThreshold 0 --readCountsMatterOnZeroBranches --tipRegex "^(.*)-fq[0-9]+_read_([0-9]+)_count_([0-9]+)$" --multifurcationThreshold 1e-05 --branchLengthNormalisation "pty_19-04-28-19-03-59/ptyr5_normconst.csv" --outputAsRDA --pdfwidth 30 --pdfrelheight 0.15 --verbose
Rscript ~/phyloscanner-master/tools/classify_relationships.R "pty_19-04-28-19-03-59/ProcessedTree_s_ptyr5_" "pty_19-04-28-19-03-59/subgraphs_s_ptyr5_" "pty_19-04-28-19-03-59/ptyr5" --branchLengthNormalisation "pty_19-04-28-19-03-59/ptyr5_normconst.csv" --verbose
Rscript ~/phyloscanner-master/tools/summary_statistics.R "pty_19-04-28-19-03-59/ptyr5_patients.txt" "pty_19-04-28-19-03-59/ProcessedTree_s_ptyr5_InWindow_" "pty_19-04-28-19-03-59/subgraphs_s_ptyr5_InWindow_" "pty_19-04-28-19-03-59/ptyr5_" --tipRegex "^(.*)-fq[0-9]+_read_([0-9]+)_count_([0-9]+)$" --blacklists "pty_19-04-28-19-03-59/ptyr5_blacklistdwns_InWindow_" --verbose
Rscript ~/phyloscanner-master/tools/transmission_summary.R "pty_19-04-28-19-03-59/ptyr5_classification_InWindow_" "pty_19-04-28-19-03-59/ptyr5_trmStats.csv" --minThreshold 1 --allowMultiTrans --verbose
Additionally, here is the output from classify_relationships.R:
+ Rscript ~/phyloscanner-master/tools/classify_relationships.R pty_19-04-28-19-03-59/ProcessedTree_s_ptyr5_ pty_19-04-28-19-03-59/subgraphs_s_ptyr5_ pty_19-04-28-19-03-59/ptyr5 --branchLengthNormalisation pty_19-04-28-19-03-59/ptyr5_normconst.csv --verbose
Reading tree file pty_19-04-28-19-03-59/ProcessedTree_s_ptyr5_InWindow_1525_to_1774.tree...
Reading annotations...
Reading splits file pty_19-04-28-19-03-59/subgraphs_s_ptyr5_InWindow_1525_to_1774.csv ...
Parsed with column specification:
cols(
host = col_character(),
subgraph = col_character(),
tip = col_character()
)
Collecting tips for each host...
Collapsing subgraphs...
Identifying pairs of unblocked splits...
|==================================================| 100%
Calculating pairwise distances between splits...
|==================================================| 100%
Testing pairs...
|==================================================| 100%
Writing to file pty_19-04-28-19-03-59/ptyr5_classification_InWindow_1525_to_1774.csv ...
It's not especially clear from the console output (it would be using phyloscanner_analyse_trees.R) but the problem here is that there is practically no NGS data there. There is only one host in that tree (everything else is a reference), and since you're asking it for pairwise relationships between hosts, the scripts are a bit confused. What's the origin of the data?
The data is paired-end short reads from the IVA publication dataset that was used in the phyloscanner practical illustrating application to a real HIV dataset. I see what you mean with little data, but here are some examples from a different run with hosts (ERR732112 and ERR732075) that produces the same error Tree_ptyr2_InWindow_1550_to_1799.pdf Tree_ptyr2_InWindow_1525_to_1774.pdf
An update... solved. It appears TransmissionSummary.R expected the column name for ancestry values (e.g. anc, none, desc, multiAnc) in the classification files to be 'path.classification' rather than 'ancestry' hence an update to the column name resolved the issue. Many thanks!
An update... solved. It appears TransmissionSummary.R expected the column name for ancestry values (e.g. anc, none, desc, multiAnc) in the classification files to be 'path.classification' rather than 'ancestry' hence an update to the column name resolved the issue. Many thanks!
Is this also relevant to transmission_summary.R?
Not quite, the error in transmission_summary.R seems to be triggered by a call to summarise.classifications which subsequently calls merge.classifications in collapsed_tree_methods.R. The line: tt <- tt %>% add_column(tree.id = ptree$id)
in merge.classifications expects a field called 'id' in ptree but the structure of ptree (as obtained via: str(ptree)) shows the following fields:
List of 2
$ classification.file.name: chr "trmstats_TransmissionSummary_script/ptyr2_trmStats/ptyr2_classification_InWindow_1000_to_1249.csv"
$ suffix : chr "1000_to_1249"
hence updating the: tt <- tt %>% add_column(tree.id = ptree$id)
in merge.classifications to: tt <- tt %>% add_column(tree.id = ptree$suffix)
resolved the issue.
The merge.classifications function is as I want it and it was the old scripts that hadn't been updated to use "tree.id" rather than "suffix" that were at fault. I've fixed this.
Hi Matthew,
I get an error with phyloscanner's summary_statistics.R script, it looks like the error comes from a call to the function: calc.all.stats.in.window, have I missed something here, any suggestions? Please note that I intend to switch to just phyloscannerR going forward, I just wanted to complete the phyloscanner.R.utilities work flow first (i.e. summary_statistics.R and TransmissionSummary.R scripts). Attached is some data to help reproduce the problem pty_19-04-28-19-03-59.zip
Session info