ML-Bioinfo-CEITEC / genomic_benchmarks

Benchmarks for classification of genomic sequences
Apache License 2.0
118 stars 17 forks source link

Human nontata promoters #2

Closed simecek closed 3 years ago

simecek commented 3 years ago

This is a new benchmark, the first one on a list we discussed last week.

katarinagresova commented 3 years ago

Rebased changes from main into this branch. Fixed few path problem - home was referenced with absolute path instead of relative. Question 1: why is demo for seq2loc (genomic_benchmarks/seq2loc/demo/create_datasets.ipynb) the same as script for creating dataset (docs/human_nontata_promoters/create_datasets.ipynb) ? Question 2: can we adapt loc2seq into ensembl_scraper? Because now we have many Ns in sequences and it is not desired

simecek commented 3 years ago

Perfect. Thanks a lot. Q1: This is just a mistake caused by the fact that I originally started to write demo and then moved the file. I will fix it and merge. Q2: I am not sure I understand. I was thinking about porting ensembl_scraper into tools here but not sure how difficult would it be and how much dependencies & large files would it added. Anyway, that is not connected to this PR. Let us solve it separately.