Open alvinwt opened 4 months ago
Hi Alvin,
We are looping over intervals of 4 files at a time, as for each .vcf file four different bed files are generated in the previous script . In the next for loop, the four bed files (GENEBODY, REPLISEQ, SIMPLEREPEAT, TSB) are loaded.
See below a picture of the output files present in the files_snv_test_all
object.
I hope it has been clear - if not, let me know
Hi,
I am trying to classify my vcf with your RF classifier and I am having some difficulty with the code. Are you splitting the regions into 4 base intervals and subsetting the file list using the intervals? There might be a bug here and I'm not sure if I am understanding this correctly.
Best wishes, Alvin
https://github.com/ProjectsVanBox/colibactin_detection/blob/8cd87b30b4a50f03c50b3df71be9dd39f3ccca2b/randomForest/3_parseBedFiles.R#L9C1-L23C54