bmoore20 / habs

Detect Harmful Algal Blooms (HABs) in images of the Finger Lakes.
0 stars 0 forks source link

Roadmap #1

Open AlephNotation opened 3 years ago

AlephNotation commented 3 years ago

I've looked it over. Cool project!

Where do would you like to start?

bmoore20 commented 3 years ago

Thanks Ty!

Below are some possible paths we can take. I'm thinking that it would be a good idea to start with data augmentation and generating a larger dataset with more variation. But I am open to other starting points too if you have one in particular that you think would be better. Looking forward to touching base on Tuesday to scope out a roadmap!

Pre-processing & Improving Dataset

Machine Learning:

Software Development:

Environment Set-up:

bmoore20 commented 3 years ago

Documenting important points from Tuesday's meeting (2.23.21)

First steps to take:

  1. Create a custom HABs dataset class using PyTorch https://pytorch.org/tutorials/beginner/data_loading_tutorial.html

    class HABDataset(torch.utils.data.Dataset):
             def __init__(self, data_dir: str):
                   self.data_dir = data_dir
    
             def _transform(self, image):
                   image = np.array(image.resize((32, 32))) / 255.0
    
             def __len__(self):
                   # of images in dir
    
             def __getitem__(self, idx):
                  imagePath = pathlib.Path(self.data_dir) + f`path_to_data_{idx}.png`
                  image = Image.open(imagePath)
                  return self._transform(image), target
  2. Transition current Keras model to PyTorch
    • Keras model will give us a baseline to compare our new PyTorch model to
  3. Reorganize structure of repo

Future & miscellaneous items:

@AlephNotation feel free to add anything that I may have missed!

AlephNotation commented 3 years ago

@bmoore20 do you want to set up a zoom for next week?

bmoore20 commented 3 years ago

@AlephNotation yeah that would be great. Thanks Ty!

I haven't received my new Pratt email yet, so I might have to use my eamoore133@gmail.com email for this time. What days/times work for you?

bmoore20 commented 3 years ago

3.8.21 Meeting Notes

def _get_image(self, idx):
     im_path = self.images[idx]
     image = Image.open(im_path)
     return self._transform(image)

def _make_target(self, idx):
     im_path = Path(self.images[idx])
     _class = im_path.parents[-1]
     if ....

     return target

def __getitem__(self, idx):
     image = self._get_image(idx)
     target = self._make_target(idx)

     return image, target

Thanks for meeting today @AlephNotation !

bmoore20 commented 3 years ago

3.23.21 Meeting Notes

bmoore20 commented 3 years ago

3.30.21 Meeting Important Points

bmoore20 commented 3 years ago

4.6.21 Meeting Important Points

bmoore20 commented 3 years ago

4.13.21 Meeting Important Points

logging.captureWarnings(True) logger = logging.getLogger(name) #################

def name_and_args() -> List[Tuple[str, Any]]: """ Helper function to print args of the function it is called in.

:return: Tuple of arg names and values
"""
caller = inspect.stack()[1][0]
args, _, _, values = inspect.getargvalues(caller)
return [(i, values[i]) for i in args]


- Notes:
  - We both decided that it would be best to make the user always pass in a PyTorch `transforms` object to `HABsDataset`. If `transforms`  is not passed in, then an error is thrown. This is because at the VERY LEAST a Rescale and Crop transform needs to be passed in. Also, ToTensor and Normalize is a good idea too. But we are allowing the user to take full responsibility of what transforms are used, as long as at least one is. 
    - We were originally thinking of doing a conditional in `dataset.py` that performed ToTensor and Normalize transforms if the user passed in _None_ for `transforms`, but we decided that that would be over compensating for the users actions. It would just make it more confusing and unclear on what the program is doing to the user. Trying to "out-smart" the user is not a good idea. 

- To do:
  - Add logging code
    - code that gets the names and arguments -> want to record as much info as possible so we can recreate really good inputs
  - Throw error if there are no transforms
  - Add input parameter for magnitude_increase variable 
  - Merge in code for `feature/capture-seeds` and `feature/typer`
  - Look into how to get my repo into Colab 
     - write script that syncs my code to Google Drive 
     - set up a separate 30 min meeting with Ty for setting up Colab?
bmoore20 commented 3 years ago

4.20.21 Meeting Important Points

To do:

Notes:

Links:

bmoore20 commented 3 years ago

4.27.21 Meeting Notes

bmoore20 commented 3 years ago

@AlephNotation I read a little more on why we need the validation set because it is still fairly new to me. With your explanation you gave me today and what I read, it is starting to make sense. Below is my current understanding of the next steps that I should take. In order to make sure that we are on the same page before I make changes in PR #39, can you please have a quick look to check to see if my logic is correct?

Split the original HABs dataset of images into a total of 3 subsets:

  1. training set (60%)
  2. validation set (20%)
  3. test set (20%)

The training set and validation set will both be inside the epoch loop and passed into either training_lap or validation_lap (which I still have to make). The test set will be passed into my current evaluate method, which run after all of the training is completed and calls the final chosen model. So I am going to keep the current evaluation method I have, but also create two new methods training_lap and validation_lap. The validation_lap does not perform back propagation or have an optimizer, but it does calculate loss. Both methods will return a loss.

Thanks! Also - sorry I was a little discombobulated on the call today

bmoore20 commented 3 years ago

5.4.21 Meeting Notes

bmoore20 commented 3 years ago

5.11.21 Meeting Notes

bmoore20 commented 3 years ago

5.18.21 Meeting Notes

bmoore20 commented 3 years ago

6.1.21 Meeting Notes

now = datetime.now() now_str = now.strftime("YYYY-mm-dd_HH-MM-SS") RUN_DIR=PATH(TENSORBOARD_DIR) / now_str


https://pytorch.org/docs/stable/tensorboard.html#torch.utils.tensorboard.writer.SummaryWriter

- Colab notebook is an instance. When we close out of it, we lose our history. That is why we have to reload in the HABs github directory every time. Therefore, we want to write to the Google Drive dir. 
- ResNets -> skip it
- Replace convolution layers with ResNet layers
- People are moving away from Batch Norms (article from Ty)
bmoore20 commented 3 years ago

Looks like someone is already doing a similar program for HABs:

bmoore20 commented 3 years ago

6.10.21 Meeting Notes

bmoore20 commented 3 years ago

No meeting on 6.15.21

bmoore20 commented 3 years ago

6.22.21 Meeting Notes

bmoore20 commented 3 years ago

6.29.21 Meeting Notes

Next Steps