Sum02dean / MLG

Machine Learning in Genomics Course ETH
MIT License
3 stars 2 forks source link

Allow user to specify directoy #33

Closed Sum02dean closed 2 years ago

Sum02dean commented 2 years ago

https://github.com/Sum02dean/MLG/blob/7fdd276a8bbc2054f58803bcf4af53049ccf226a/task_1/utils/data_loader.py#L9-L10

Seems limited to hard-code the directory path like this. I am having issues using your function within the notebooks directory where I would need to specify datapath as '../data/

Suggestion:

 def load_info(path_to_dir: str, filename: str) -> pd.DataFrame: 
     return pd.read_csv(f'{path_to_dir}{os.sep}CAGE-train{os.sep}{filename}.tsv', sep='\t') 
LiineKasak commented 2 years ago

where exactly did you encounter this? by default in a module (here it's task_1) the current working directory will always be at module level? for example if I run the main block, then even if it's in the utils directory the working directory is task_1

was it in a notebook or what happened?

Sum02dean commented 2 years ago

yes in a notebook e.g. task_1/notebooks/some_file.ipynb

LiineKasak commented 2 years ago

hmm maybe we should keep the notebooks at module level then? as they're not as smart as py files at understanding modules... or in the first line of notebook simply make them believe they're at module level by for example %cd .. ?

Sum02dean commented 2 years ago

Perfect. The %cd trick worked!

TaoDFang commented 2 years ago

Here i encounter another issue to create GCT expression file for IGV. For this purpose, train and val data need to have its own separated files . while for now the the train and validation are alway merged together and referred as "train" data. I think it's better to load them separately , or add another column to species if they are from training or validation dataset? And as we discuss yesterday , we will just just default training and validation dataset so the default label information will also be useful later when training models ?