ProjectsVanBox / colibactin_detection

0 stars 0 forks source link

Difficulty classifying my VCFS #2

Open alvinwt opened 4 months ago

alvinwt commented 4 months ago

Hi,

I am trying to classify my vcf with your RF classifier and I am having some difficulty with the code. Are you splitting the regions into 4 base intervals and subsetting the file list using the intervals? There might be a bug here and I'm not sure if I am understanding this correctly.

Best wishes, Alvin

https://github.com/ProjectsVanBox/colibactin_detection/blob/8cd87b30b4a50f03c50b3df71be9dd39f3ccca2b/randomForest/3_parseBedFiles.R#L9C1-L23C54

AxelRosendahlHuber commented 1 month ago

Hi Alvin,

We are looping over intervals of 4 files at a time, as for each .vcf file four different bed files are generated in the previous script . In the next for loop, the four bed files (GENEBODY, REPLISEQ, SIMPLEREPEAT, TSB) are loaded.

See below a picture of the output files present in the files_snv_test_all object. image001

I hope it has been clear - if not, let me know