Sum02dean / MLG

Machine Learning in Genomics Course ETH
MIT License
3 stars 2 forks source link

Explore the data - basic EDA #16

Open Sum02dean opened 2 years ago

Sum02dean commented 2 years ago

We need to understand data, and how we may represent the data in a machine learning appropriate manner.

TaoDFang commented 2 years ago

I import bed data to IGV, why all signals display same height ? and for ".bed" file , there are not strand direction

to import expression data to IGV, we need to merger info data and value data : https://software.broadinstitute.org/software/igv/ExpressionData

LiineKasak commented 2 years ago

As I understand it, bed files are supposed to be the same height? It just shows the peaks by some measure...

For bigwig files, don't forget to autoscale in IGV :)

For the strand direction I have no clue..

TaoDFang commented 2 years ago

I not sure if bed files are supposed to have same height. They mainly to show position of peaks but also have a "score" or "signedvalue" to display height information .

for expression data , it seems use differente color to represent values, but for bed files , i dont now

by the way, i also create a gct file so we can also import expression data to IGV . while Im not sure where to put this file so you guys can also try. Currently we ignore all files in "data" folder

LiineKasak commented 2 years ago

yeah we ignore the data files right now since otherwise Dean will run out of space in github and would have to pay, I've run into that before haha. and the project updates would get large. but if the file is relatively small and useful for everyone then maybe we can put it under data_small/ ? or we can just share them in slack, maybe in a new channel

LiineKasak commented 2 years ago

or, as I did, I added the link to the data into the readme and said where it should be under what name

but if it's not necessary for the code then rather share in slack imho

TaoDFang commented 2 years ago

ah. i will create a new folder for this and created a new branch for this issue, but mainly to test new branch workflow