Closed kfletcher88 closed 5 years ago
Yes, it's fine to continue with the message.
It's likely that results of convertToFasta are correct. They are after Bonferroni correction and not just the ones smaller than 0.05. No entries in control_kmers.fasta could be due to small number of control samples or the phenotype due to presence of a gene in the cases.
No particular reason. If number of k-mers found significant after the R script is very high (close to 200,000) then 200,000 should be increased. It can be decreased to save time.
Thanks for the clarification!
Hi, Thanks for this exciting repository.
I was using the runHAWK script on some of my own data and the provided e.coli examples. I run in to the same error on both data sets at the loop https://github.com/atifrahman/HAWK/blob/master/log_reg_case.R#L66-L109
It always exits on
However the files pvals_case_top.txt and pvals_control_top.txt do have 200,000 lines each, so I assume I am ok to continue with this error message?
If so, I have another query. I am not sure if the execution of converToFasta is correct. After running runHAWK case_kmers.fasta contained 5664 k-mers while control_kmers.fasta contained 0. Is this correct? I didn't recieve any error messages suggesting that is terminated prematurely and the rm commands at the end of runHAWK were executed.
These numbers don't seem in line with what is significant in the two pvals*top_merged_sorted.txt files.
Finally, is there a reason that only the top 200,000 kmers from each *out_w_bonf_sorted.kmerDiff are used? Would there ever be a situation where you think this should be increased or decreased?
Thanks in advance for you help!