Open AlephNotation opened 3 years ago
Thanks Ty!
Below are some possible paths we can take. I'm thinking that it would be a good idea to start with data augmentation and generating a larger dataset with more variation. But I am open to other starting points too if you have one in particular that you think would be better. Looking forward to touching base on Tuesday to scope out a roadmap!
Pre-processing & Improving Dataset
Machine Learning:
Software Development:
Environment Set-up:
First steps to take:
Create a custom HABs dataset class using PyTorch https://pytorch.org/tutorials/beginner/data_loading_tutorial.html
class HABDataset(torch.utils.data.Dataset):
def __init__(self, data_dir: str):
self.data_dir = data_dir
def _transform(self, image):
image = np.array(image.resize((32, 32))) / 255.0
def __len__(self):
# of images in dir
def __getitem__(self, idx):
imagePath = pathlib.Path(self.data_dir) + f`path_to_data_{idx}.png`
image = Image.open(imagePath)
return self._transform(image), target
top
data \
image_set 1
image_set 2
hab \
dataset.py
utils
train
model \
model.py
blocks \
resnet.py
Future & miscellaneous items:
@AlephNotation feel free to add anything that I may have missed!
@bmoore20 do you want to set up a zoom for next week?
@AlephNotation yeah that would be great. Thanks Ty!
I haven't received my new Pratt email yet, so I might have to use my eamoore133@gmail.com email for this time. What days/times work for you?
3.8.21 Meeting Notes
dataset.py
:
torchvision.transforms.compose
to create list of Transforms to be passed into dataset__len__()
__getitem__
to save memorydef _get_image(self, idx):
im_path = self.images[idx]
image = Image.open(im_path)
return self._transform(image)
def _make_target(self, idx):
im_path = Path(self.images[idx])
_class = im_path.parents[-1]
if ....
return target
def __getitem__(self, idx):
image = self._get_image(idx)
target = self._make_target(idx)
return image, target
Thanks for meeting today @AlephNotation !
3.23.21 Meeting Notes
train.py
out into a function -> lap
(Stephen's term)eval()
mode before testing -> gets rid of dropoutsBlack
PyTest
for tests
transformations.py
-> random crops, 90 degree flips, etc.__len__
will no longer be able to calculated using the length of the image_paths array because we are making thousands more images out of these images)classify
mode also be given the same transformations?? -> Both Ty and I will do reading on this and then we will talk about what we found. 3.30.21 Meeting Important Points
4.6.21 Meeting Important Points
4.13.21 Meeting Important Points
Links:
Logging
# at the top of any file that uses logger
#### Logging ####
logging.basicConfig(
level=logging.INFO, format="[%(asctime)s] PW4k:%(levelname)s - %(name)s - %(message)s", handlers=[logging_utils.ch, logging_utils.fh],
)
logging.captureWarnings(True) logger = logging.getLogger(name) #################
def name_and_args() -> List[Tuple[str, Any]]: """ Helper function to print args of the function it is called in.
:return: Tuple of arg names and values
"""
caller = inspect.stack()[1][0]
args, _, _, values = inspect.getargvalues(caller)
return [(i, values[i]) for i in args]
- Notes:
- We both decided that it would be best to make the user always pass in a PyTorch `transforms` object to `HABsDataset`. If `transforms` is not passed in, then an error is thrown. This is because at the VERY LEAST a Rescale and Crop transform needs to be passed in. Also, ToTensor and Normalize is a good idea too. But we are allowing the user to take full responsibility of what transforms are used, as long as at least one is.
- We were originally thinking of doing a conditional in `dataset.py` that performed ToTensor and Normalize transforms if the user passed in _None_ for `transforms`, but we decided that that would be over compensating for the users actions. It would just make it more confusing and unclear on what the program is doing to the user. Trying to "out-smart" the user is not a good idea.
- To do:
- Add logging code
- code that gets the names and arguments -> want to record as much info as possible so we can recreate really good inputs
- Throw error if there are no transforms
- Add input parameter for magnitude_increase variable
- Merge in code for `feature/capture-seeds` and `feature/typer`
- Look into how to get my repo into Colab
- write script that syncs my code to Google Drive
- set up a separate 30 min meeting with Ty for setting up Colab?
4.20.21 Meeting Important Points
To do:
device
in train.pymodel
, optimizer
, and # of epochs
configurable (arguments)Notes:
terminate
a Colab session when you are done working ... don't leave it runningtouch
command creates and names a new blank fileLinks:
4.27.21 Meeting Notes
Black
code formatter with the HABs
repotraining_lap
and validation_lap
training_lap
-> how you update model weights, optimizer and back propagationvalidation_lap
-> used to see how good the hyperparameters are, no optimizer/back proptraining_lap
and validation_lap
learn rate optimizers
@AlephNotation I read a little more on why we need the validation set because it is still fairly new to me. With your explanation you gave me today and what I read, it is starting to make sense. Below is my current understanding of the next steps that I should take. In order to make sure that we are on the same page before I make changes in PR #39, can you please have a quick look to check to see if my logic is correct?
Split the original HABs dataset of images into a total of 3 subsets:
The training set and validation set will both be inside the epoch loop and passed into either training_lap
or validation_lap
(which I still have to make). The test set will be passed into my current evaluate
method, which run after all of the training is completed and calls the final chosen model. So I am going to keep the current evaluation method I have, but also create two new methods training_lap
and validation_lap
. The validation_lap
does not perform back propagation or have an optimizer, but it does calculate loss. Both methods will return a loss.
Thanks! Also - sorry I was a little discombobulated on the call today
5.4.21 Meeting Notes
for epoch in range(epoch):
log the epoch # and the training loss to a 7-point decimalDataLoader
object and do for batch in data_loader:
in _traininghelper.pyrandomCrop
on images ... maybe use a progressive GAN?5.11.21 Meeting Notes
writer.add_scalar("Loss/train", train_loss, epoch)
writer.add_scalar("Loss/val", val_loss, epoch)
__repr__
to my custom Transformation classes so they can be represented as a string and recorded when "print" is called__repr__
on Transformation objects and model
so we can print and document them for each run in our logger5.18.21 Meeting Notes
__call__
or forward
for transforms6.1.21 Meeting Notes
makeWriter
that handles functionality for creating path for writer
from datetime import datetime
from pathlib import Path
now = datetime.now() now_str = now.strftime("YYYY-mm-dd_HH-MM-SS") RUN_DIR=PATH(TENSORBOARD_DIR) / now_str
https://pytorch.org/docs/stable/tensorboard.html#torch.utils.tensorboard.writer.SummaryWriter
- Colab notebook is an instance. When we close out of it, we lose our history. That is why we have to reload in the HABs github directory every time. Therefore, we want to write to the Google Drive dir.
- ResNets -> skip it
- Replace convolution layers with ResNet layers
- People are moving away from Batch Norms (article from Ty)
Looks like someone is already doing a similar program for HABs:
6.10.21 Meeting Notes
Few-shot
& One-shot
=> GOOGLE TERMS
No meeting on 6.15.21
6.22.21 Meeting Notes
Talked through error below...
Bigger batch sizes are better -> multiple samples to do back-prop
Nerds -> pick batches in powers of 2 -> 2, 4, 8, 16, 32...
ResNet Model and Convolutions
ResNet:
6.29.21 Meeting Notes
param.requires_grad = False
-> call back-prop won't updateNext Steps
I've looked it over. Cool project!
Where do would you like to start?