dobkeratops / convnet_stuff

2 stars 2 forks source link

Adding sample training_images to the repository; using glob.glob and pathlib; about the dataset structure - separate directories for input/output etc.? #5

Open Twenkid opened 1 year ago

Twenkid commented 1 year ago

What about uploading a folder with sample images to the repository, in order to allow quick start when cloning it?

Also, I was a bit confused by the dataset format and how it's read. It's OK if we had a tool to generate these suffixes etc.

One idea. If output is supposed to be one, then it seems like the main list/the anchor.

/output
1.jpg, 2.jpg, whatever.png

/input 
1_0.jpg, 1_1.jpg, 1_bright.jpg, 1_dark.jpg, 1_x.jpg 
#  5 inputs total, the specifier is a string after last _ underscore, not just a sequential number
2_0, 2_some  # 2 inputs
whatever_cutout.jpg, whatever_perspective, whatever_noise #3 inputs

...

If we are to do long trainings and to log results, another idea that I have from these mappings is to introduce a simple SQLite database. ...

As of mapping, recently I had a related use case with experimental colorization of the grayscale deepfakes.

It was only 1:1 mapping and I used the same filenames in different directories and a map, because it was a multithreaded program - just a list and exploiting the matching filenames wasn't enough.

Also, I see you use os.listdir(). Do you know "glob.glob" - I think it's more convenient as it can filter by extension automatically. There's also pathlib which may simplify some operations.

https://docs.python.org/3/library/glob.html https://docs.python.org/3/library/pathlib.html

dobkeratops commented 1 year ago

agree this is necessary , such that it can run out of the box. Currently exactly how it works is slightly in flux but that should be possible. in the readme there should be an example command line for the training mode.

it's not far off now: I've got the main thing I wanted working: transforming multiple images. Pretty soon I can get onto the real work which is generating the training data. (There's one important detail remaining to cleanup- the downsample shape alignment, I dont like it being 255->127 etc, it should bet 256->128->.. but even with it this way in PyTorch I can recentre how I like in my implementation.. but there is a way to change the filters to do this in PyTorch aswell)

I probably already showed you, I have been collaborating a bit with this other person making an online dataset. part of the plan is to integrate with that (per-pixel annotations can work as a set of bitmap planes defining output labels)

but in the imediate future I can setup 2 key examples (which is what I have locally) : "multiple inputs, single output" , and "single input, multiple outputs". I think the code should handle multiple -> multiple aswell. the little tests I did splitting rgb should be enough

dobkeratops commented 1 year ago

If we are to do long trainings and to log results, another idea that I have from these mappings is to introduce a simple SQLite database.

never touched this before.. as with tensor board - I have a limited tolerance to bringing dependancies and new tools in . the web world tends to explode. I've made a huge concession by using PyTorch instead of writing a custom NN framework in Rust from the ground up :). I want to stay focussed on graphics

but maybe at some point it would be useful for me to use a database , I realise those are essential for serious web projects. it could be nice to set this up as a service for other people or something like that (I'm sure there's dedicated things for that already aswell).

maybe its possible the visualiser and logging could be partitioned off in the same way such that it could be done as a plugin together?

regarding image directory size.. I think facial detail takes huge amounts of data because we're so good at perceiving faces , there's so much to go wrong.. but for the main part this is going to be simpler, more like glorified procedural texturing. Doesn't have to be photoreal. I'm hoping 100mb nets will be sufficient, so 10gb of image data (that should take ~10s of seconds to load off a modern NVME drive) should be enough to create an 'information bottleneck'. I think the size of the nets will be constrained by the idea of running this in realtime. a 100mb net to enhance a 1-32mb game

but I haven't dont a serious test of this yet. I'm pulling these numbers out of thin air (I've got some ballparks from imagine entries. also I doubt we will need something as big as the 5gb stable diffusion model, which can handle anything. we can tailor one net for one game)

dobkeratops commented 1 year ago

for something more elaborate though.. I should be able to make an interface around their data loader to again make an easy place to plugin a more elaborate way to feed the training process, so in the end we can explore realtime video training, massive million-billion image datasets etc.. "multi-input multi-output data loader" inheriting from their basic one, -> derive a simple "load a directory" example from that, and then later do realtime video streams etc

dobkeratops commented 1 year ago

https://github.com/ImageMonkey/imagemonkey-core there's a bunch of street scene annotations and so on here, some markup of surface textures , some labelled bodyparts for figures. I liked this person's goal of making an online image dataset-there was something similar called 'labelme' the bulk of which was older images, lower res; he made a site that was presented more nicely aswell. the more options for data the better

dobkeratops commented 1 year ago

as for separate directories I did think that might be nice aswell . dataset/input/. dataset/output/. I went with the _ naming convention initially. one usecase is PBR texture channel generators , some of those tools take _metalness _roughness _ao etc (this is something I've wanted for years..). and at some point I'll be writing images & z-buffer out from my engine, I'd like to try neural shading

Twenkid commented 1 year ago

(There's one important detail remaining to cleanup- the downsample shape alignment, I dont like it being 255->127 etc, it should bet 256->128->.. but even with it this way in PyTorch I can recentre how I like in my implementation.. but there is a way to change the filters to do this in PyTorch aswell)

255->127 ? strange

I probably already showed you, I have been collaborating a bit with this other person making an online dataset. part of the plan is to integrate with that (per-pixel annotations can work as a set of bitmap planes defining output labels)

Now that you said I recalled you mentioned in some of the FB chats. Collaborating is great, but as of annotating - I think there are many other options, too:

https://en.eagle.cool/blog/post/image-annotation-tool https://humansintheloop.org/10-of-the-best-open-source-annotation-tools-for-computer-vision-2022/

One popular dataset/tools for segmentation: Coco dataset/annotation: https://cocodataset.org/#home

dobkeratops commented 1 year ago

the fixed datasets are great for consistent benchmarks - what I liked about the other guys idea is an open ended dataset that grows continuously

I wonder if stable diffusion will get more people into (narrow)AI . the quality depends on the training data. Anyone can contribute to that.

Some people say just like with Youtube being sued eventually after running with piracy to gain users for so long, that what will happen is shutterstock etc will eventually pounce on stability.ai and shut them down , over the legal grey area of scraping.

In other discussions I had people warning me "no, this is what will happen if you use it for games..", they predicted eventually shutterstock with the biggest pro image lib would dominate they field.they show proof that the dataset contains watermarks (I've even seen it generate a few myself). I note that they "cover their ass" as in require a login "so they can contact you .." if you used it.

Whats needed is a truly community driven dataset, binding text to images in ever increasing detail- thats why I was so keen to encourage that other person to keep going

agregating data from as many sources as possible will give the best results.

I think he's going to make his tool compatible with some of those open ones aswell.

dobkeratops commented 1 year ago

(There's one important detail remaining to cleanup- the downsample shape alignment, I dont like it being 255->127 etc, it should bet 256->128->.. but even with it this way in PyTorch I can recentre how I like in my implementation.. but there is a way to change the filters to do this in PyTorch aswell)

255->127 ? strange

yeah its because I currently do 3x3 stride 2 downsamples, it doesn't like that. with everything I've tried, if I start with 256x256 it will drop to 127 or 129 or something else stupid instead of 128. currently the least stupid option I could get working was 255->127->63..

I was trying 2x2 -> 1 pixel , it wont work, and I see what they do in stable-diffusions' thing is 4x4 ,stride 2. it should be possible with just 2x2->1 pixel surely :/ but every option I've tried so far messes it up

I might take a look at some other nets.

Twenkid commented 1 year ago

I agree with building custom datasets etc., but for testing, research purposes (and for labeling on "own" data) other tools are also "free".

maybe its possible the visualiser and logging could be partitioned off in the same way such that it could be done as a plugin together?

Yes, that's reasonable. Re SQLite/DB - I mentioned logging etc., but I meant also for dataset management, logging of variants of training, selecting images for training with queries etc. Train on ... Etc. I see Imagemonkey uses a load of DB stuff, too. SQLite is self-contained, just an exe, no complex installation, user rights etc.

regarding image directory size.. I think facial detail takes huge amounts of data because we're so good at perceiving faces , there's so much to go wrong.. but for the main part this is going to be simpler, more like glorified procedural texturing. Doesn't have to be photoreal. I'm hoping 100mb nets will be sufficient, so 10gb of image data (that should take ~10s of seconds to load off a modern NVME drive) should be enough to create an 'information bottleneck'. I think the size of the nets will be constrained by the idea of running this in realtime. a 100mb net to enhance a 1-32mb game

but I haven't dont a serious test of this yet. I'm pulling these numbers out of thin air (I've got some ballparks from imagine entries. also I doubt we will need something as big as the 5gb stable diffusion model, which can handle anything. we can tailor one net for one game) ...

I don't know about the size of the net and the dataset. Dataset size - 10 GB feels way too much for me for a 100 MB network, I don't know could it encompass such variety (or are 10 GB needed to fill the variety)

we can tailor one net for one game)

Agreed. Training both a general model and "specialists" per game is reasonable, but the specialized ones are a more reasonable goal for a start, also it would be more reliable.

dobkeratops commented 1 year ago

right I said 10gb and figured thats an upper limit, 100x the data of the net should be absolutely guaranteed to not overfit. and even 10x should be enough.

I think python still loads images quite slowly though so I might still have to look into its threaded background loading options . for the game engine I have a tool to make texture thumbnails for the streaming, which also spits out everything in a single grid to summarise.

dobkeratops commented 1 year ago

I agree with building custom datasets etc., but for testing, research purposes (and for labeling on "own" data) other tools are also "free"

right there's many ways to do it

he's just made somethign to download a dump aswell so I can build something to train off it exactly the same way as any other source. I've never actually used his integrated "automatically train on some labels" service.

what I liked about this idea is that the data is 'live'.

integrating training and labelling could be good. "show the images it struggles most with" etc. any "many pairs of eyes to find the errors..". there's been articles about how a lot of the common datasets have surprisingly large numbers of errors in them.

anyway it's all CC0 and can be converted into any existing format

dobkeratops commented 1 year ago

(and for the downsampling.. maxpooling in conjunction with skip-connections should give it fine grain positional detail, but I figured I wanted the option of it working without skip connections and still able to keep fine grain information through to the latent space - Hinton comments how he's not happy with max pool .

of course with skip connections, you can think of every level contributing to the latent space, "the latent space is really 8x8 x 256 + 16x16 x 128 + ...", not just that one innermost tensor.

Twenkid commented 1 year ago

integrating training and labelling could be good. "show the images it struggles most with" etc. any "many pairs of eyes to find the errors..". there's been articles about how a lot of the common datasets have surpringly large numbers of errors in them.

Right. BTW, for my DFK training I developed a small image-matching tool (template matching) for finding bad faces when spotting them on the training display. There's a simpler way - just recording them, printing the current files which are on the preview etc., but that search was cool, too.

BTW, re the bad samples (or labeling in that case), the sequence of feeding the inputs also matters. That's another thing to experiment with. E.g. when recently being recalled that the lowest levels of the image classification networks converge to filters which are like for non-DL tools such as edges, various gradients, "wavelets" etc., and your ideas to train layer-by-layer separately, and older memories/suggestions to CogAlg/Boris for training initially on simpler images with incrementally rising complexity, I figured out:

What about pretraining sequentially on a more and more complex images, how the model would evolve and wouldn't it learn faster, than if it's bombarded with random complex input?

One empirical proof for that, which I got from the DFK training, was when recently I started to use an option "Uniform Yaw", which is a sort of dataset balancing, the orientations of the heads (Yaw) are approximated to 127 positions.

It is applied automatically for pretraining on a dataset of many faces, but I didn't use it when training single-face-to-single-face.

Due to the imbalances in the dataset - too little profile images - without Uniform Yaw these ones are very foggy initially and very slowly progress, also these items are hit rarely and are "unusual" for the model and slow down the process.

Proper balancing, includes sequencing, provides faster training/convergence.

what I liked about this idea is that the data is 'live'.

Right, I also don't "vote" for just downloading fixed/standard datasets only, but they could be used for experiments. IMO a system should download on its own additional samples etc., a smart machine would use the Internet as a user, search with Google, take images etc. It is actually a way to collect some data, directly from Google Images search page, they are labeled etc. :)

(There's one important detail remaining to cleanup- the downsample shape alignment, I dont like it being 255->127 etc, it should bet 256->128->.. but even with it this way in PyTorch I can recentre how I like in my implementation.. but there is a way to change the filters to do this in PyTorch aswell)

255->127 ? strange

yeah its because I currently do 3x3 stride 2 downsamples, it doesn't like that. with everything I've tried, if I start with 256x256 it will drop to 127 or 129 or something else stupid instead of 128. currently the least stupid option I could get working was 255->127->63..

I was trying 2x2 -> 1 pixel , it wont work, and I see what they do in stable-diffusions' thing is 4x4 ,stride 2. it should be possible with just 2x2->1 pixel surely :/ but every option I've tried so far messes it up

I might take a look at some other nets.

I think they solve this by expanding the size of the tensor, maybe concat one columnor some padding. I figured out a concat sequence: https://pytorch.org/docs/stable/generated/torch.cat.html

import torch

tensor127 = torch.empty(127,127)

row = torch.randn(127,1)
to128 = torch.cat((tensor127,row),1)
print(to128.size())

col = torch.randn(1,128)
to1282 = torch.cat((to128,col),0)
print(to1282.size())
>python z:\t.py
torch.Size([127, 128])
torch.Size([128, 128])

The row and col could be sliced from the original tensor (with one element missing).

There's also "pad", padding, but it expands the tensor on all sides: https://pytorch.org/vision/main/generated/torchvision.transforms.functional.pad.html

Twenkid commented 1 year ago

(and for the downsampling.. maxpooling in conjunction with skip-connections should give it fine grain positional detail, but I figured I wanted the option of it working without skip connections and still able to keep fine grain information through to the latent space - Hinton comments how he's not happy with max pool .

What about not just maxpooling, but other kinds of pooling/selection?

I realize that the set of different filters and the activation function in general serve as sort of "pooling" (selection) mechanism. These are tools for automatically defining "conditional operators".

of course with skip connections, you can think of every level contributing to the latent space, "the latent space is really 8x8 x 256 + 16x16 x 128 + ...", not just that one innermost tensor.

And in Unet/image segmentation, connecting tensors from the encoder pipeline to the decoder, then again the latent space includes both.

Yes, the code is not just the smallest dimensions, as it needs the rest in order to get encoded or to be expanded to images.

That reminds me of some notes regarding the DL explosion, the number of lines. The story about "complexity". How the Hinton, Krizhevsky ImageNet 2012 network was 30K lines of Python or so, then with Tensorflow - fewer, then with Keras - even fewer etc.

They all forget the LOC of the Python interpreter, of the OS on top of which it's running, including all system and application software that was needed in order to develop C, the OS, Python to the point where it can be encoded so succintly.

On the same theme about compression, thougths from 20 years ago. Mind compresses, but in order the abstract concepts to actually get meaningful in the real world, they have to invoke lower level represenations down to the physical laws with all their details. When we move a finger it may be stated that these are a few bits, consciously controlled, but the change as of representation and location of the finger in the real Universe is enormous, they actually include the whole body, not just some abstract command to the finger, and not just the brain, because the finger needs the whole body to exist and to completely "express" its existence, location and properties.

Going even deeper, they all need the whole Universe to exist as it is, the Universe is the one to allow any of these changes to happen.

Thus the selection of the scope of evaluation determines how we judge the compression parameters, ratio etc. whith a given framework where we assume the other parameters are const or we disregard them etc.

dobkeratops commented 1 year ago

What about not just maxpooling, but other kinds of pooling/selection?

I realize that the set of different filters and the activation function in general serve as sort of "pooling" >?>(selection) mechanism. These are tools for automatically defining "conditional operators".

right at the minute using PyTorch - its a case of using the provided tools. My usual urge is to write everything from the maths up , but here I've given in and accepted this ecosystem. That makes it harder for me to experiment at such a low level. I dont enjoy digging through libraries and sourcebases.

but it should indeed be possible to make an efficient downsamples that preserves fine-grain info in a useful way. Something like: the horizontal and vertical gradients of some of the features , combined with av-pool of everything.
a close approximation may be to combine those 1x1's with 'depth wise convolutions' (num_groups = num channels)- (e.g. x[n+1] = conv2d_w1h1_outputChannels256 ( avpool(cat( x[n], relu(horiz_gradients(x[n])),relu(- that) relu(vert_gradients(x[n])), relu(..) )) . Or. conv2d_w1h1_output256(. avpool( relu(depthwiseconv(. relu(conv2d_w1h1(_output512(x[n])))))) / temporarily increase channels then compare change across the visual field, then downsample , and bring back to reasonable dims /

I'll see how it goes.

in the end here I think the real meat of this is going to be the data, generating useful input/output pairs in the first place. using Stable Diffusion on the footage, then our net compresses/accelerates it? or finding a mapping from the cartoonyness of retrogame features to raw images?

too many ideas to try everything.. plenty to get on with

Twenkid commented 1 year ago

What about not just maxpooling, but other kinds of pooling/selection?

I realize that the set of different filters and the activation function in general serve as sort of "pooling" >?>(selection) mechanism. These are tools for automatically defining "conditional operators".

right at the minute using PyTorch - its a case of using the provided tools. My usual urge is to write everything from the maths up , but here I've given in and accepted this ecosystem. That makes it harder for me to experiment at such a low level. I dont enjoy digging through libraries and sourcebases.

I also used to like to build things from scratch and to have the least amount of dependencies, but sometimes it takes too much efforts and can be saved. (BTW, I like the idea of the NN library in OpenCL, I glanced CUDA documentation again, in some circumstances I also may make some exercises with these libs).

Re the digging through, you don't have to do everything yourself, that's why we collaborate etc. Some pooling variants which pop up in torch:

https://pytorch.org/docs/stable/nn.html E.g. Average. "Minpool" - inverting the wights: = 1.0 - w then maxpool. Etc.

Pooling layers

nn.MaxPool1d | Applies a 1D max pooling over an input signal composed of several input planes. -- | -- nn.MaxPool2d | Applies a 2D max pooling over an input signal composed of several input planes. nn.MaxPool3d | Applies a 3D max pooling over an input signal composed of several input planes. nn.MaxUnpool1d | Computes a partial inverse of MaxPool1d. nn.MaxUnpool2d | Computes a partial inverse of MaxPool2d. nn.MaxUnpool3d | Computes a partial inverse of MaxPool3d. nn.AvgPool1d | Applies a 1D average pooling over an input signal composed of several input planes. nn.AvgPool2d | Applies a 2D average pooling over an input signal composed of several input planes. nn.AvgPool3d | Applies a 3D average pooling over an input signal composed of several input planes. nn.FractionalMaxPool2d | Applies a 2D fractional max pooling over an input signal composed of several input planes. nn.FractionalMaxPool3d | Applies a 3D fractional max pooling over an input signal composed of several input planes. nn.LPPool1d | Applies a 1D power-average pooling over an input signal composed of several input planes. nn.LPPool2d | Applies a 2D power-average pooling over an input signal composed of several input planes. nn.AdaptiveMaxPool1d | Applies a 1D adaptive max pooling over an input signal composed of several input planes. nn.AdaptiveMaxPool2d | Applies a 2D adaptive max pooling over an input signal composed of several input planes. nn.AdaptiveMaxPool3d | Applies a 3D adaptive max pooling over an input signal composed of several input planes. nn.AdaptiveAvgPool1d | Applies a 1D adaptive average pooling over an input signal composed of several input planes. nn.AdaptiveAvgPool2d | Applies a 2D adaptive average pooling over an input signal composed of several input planes. nn.AdaptiveAvgPool3d | Applies a 3D adaptive average pooling over an input signal composed of several input planes.
Twenkid commented 1 year ago

but it should indeed be possible to make an efficient downsamples that preserves fine-grain info in a useful way.

"Convolution is all you need"... :) (like "Ättention is all you need")

Something like: the horizontal and vertical gradients of some of the features , combined with av-pool of everything. a close approximation may be to combine those 1x1's with 'depth wise convolutions' (num_groups = num channels)- (e.g. x[n+1] = conv2d_w1h1_outputChannels256 ( avpool(cat( x[n], relu(horiz_gradients(x[n])),relu(- that) relu(vert_gradients(x[n])), relu(..) )) . Or. conv2d_w1h1_output256(. avpool( relu(depthwiseconv(. relu(conv2d_w1h1(_output512(x[n])))))) / temporarily increase channels then compare change across the visual field, then downsample , and bring back to reasonable dims /

I think particular gradients etc. for features (i.e. kernel-size rectangles? in this context?) can be computed by convolving with selected filters: gradient, a line on top, a line on the bottom etc. They will be part of a library of filters, not learnable, or taken from the lowest lowest layers of the network, preserved and applied on higher levels again.

Another idea which I like and want to experiment with: applying other free methods (not limited to the NN palette), computing parameters, more abstract features, extracting them from a layer and attaching them to the conv. layers/concat etc., possibly intermixed with the lower level features.

Such parameters could be relatively high level, such as attribution to an object/a part of, e.g. recognizing a shape/object, detecting that the point/area/coordinate is inside and including that information as some value, and setting another value in the areas which are outside. Either or both: distance, on/off etc., an embedding for a number of objects/items.

In the use case of 2D retro games, that could be also tile/character or tile/sprite, or particular character etc. In general images that could be part of parts of objects etc.

Yes, such interrupting processing may slow down the network, but actually only if it's computed each time from scratch. However these high level and more abstract parameters, attributes, can be computed once for each data point and then just included in the network.

They also can be injected on different layers, not only on the lowest or some of the low, or high, or anywhere.

Further, they can be partially learnable, by making them dependent on each other for a sequence of layers, and injected as precomputed on some layers.

That way the network will have more "checkpoints" through the feedforward and feedback.

Twenkid commented 1 year ago

BTW, one cool torch's feature I've just inspected:

https://pytorch.org/docs/stable/tensor_view.html

It is like numpy slices, regions of interest, selective processing of a part of the tensor.

dobkeratops commented 1 year ago

trying to train this u-net from scratch to predict left/right halves of people. so far my data is unbalanced so its only learning to color the left/right sides a predictable way. it is at least starting to identify where the people are

training_progress19 training_progress6 training_progress18 training_progress10

Twenkid commented 1 year ago

Cool! The bad one is a difficult case, if there are not enough examples like that/or sampling them. What is the dataset - ImageMonkey? How many images?

Twenkid commented 1 year ago

If I read the color code correctly, it seems the two-persons pic has also two-halves (it's supposed to), i.e. 4 halves; while the model has tried to produce only two halves, like if there was one person - that's why one of the persons is "spilt" in the another one's mask.

dobkeratops commented 1 year ago

yes its imagemonkey. I'm using about 1000 images here, there's more . there's more data available there, but it will need 'multitask' training - i.e. its got some left/right halves of people , some arms/legs, head,/hands and a few with all of that. I also need to make it flip, and get left/right correct for that. it's still going, been about 3 hours. the error graph is still on a consistent downtrend which is encouraging

This also confirms for me it's actually able to learn somethign in the deeper layers. initially just doing denoisers, I was suspicious that the shallow layers could do all the work trivially.

I want to see how much net depth I can actually train from scratch.

dobkeratops commented 1 year ago

there are a few where its started to learn left/right are reversed when seeing the back training_progress81 training_progress111

Twenkid commented 1 year ago

For experiments any size is fine and there are good results, but for application in the real world IMO 1000 images is a few. There could be an obvious postprocessing - the two halves are supposed to be adjacent etc. It could be without NN (just fill until touching the other part etc.).

dobkeratops commented 1 year ago

one step at a time.. I want to see how far I can get training from the ground up . I'm aware prior 'densepose' nets used ~50,000 annotations. I'll see how far I can get with multi-task training .. in time the plan is to train with different related labels , and maybe find other ways of hinting related information (lowpoly art?video of people?)

the next step here is to use the gender and arms, hands annotations - its just more code , and I need to test each step along the way, and the experiments take time to run. the result today is that the rtx 3080 can train a deep net to transform images , learning something non-trivial in 3 hours, wiithin my electricity and patience budgets. I had no idea if this was going to take 30mins or 3 days or what

up until now I hadn't bothered actually experimenting with AI much at all, being put off by the training times - and using pertained nets seemed less exciting because the capabilities are ultimately bounded by the pretraining

Twenkid commented 1 year ago

Great!

one step at a time..

I am not in charge and do not set deadlines or urging you. Of course you decide what exactly to do, how much efforts to invest and when. I share my thoughts on what I see and understand, trying to help if I can.

I mentioned that method because sometimes there are simpler ones or compromises which do the job for small gaps, but one may tend to continue the currently applied track or search for perfection, without considering "hacks", which may save time.

На пт, 7.10.2022 г., 12:28 ч. dobkeratops @.***> написа:

one step at a time.. I want to see how far I can get training from the ground up . I'm aware prior 'densepose' results used 50,000 annotations. I'll see how far I can get with multi-task training .. in time the plan is to train with different related labels . e.g. the next step here is to use the gender and arms, hands annotations - its just more code , and I need to test each step along the way

— Reply to this email directly, view it on GitHub https://github.com/dobkeratops/convnet_stuff/issues/5#issuecomment-1271348694, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFSI7WFWYXWFLPI3CW2AR6TWB7UKPANCNFSM6AAAAAAQ4HQJVU . You are receiving this because you authored the thread.Message ID: @.***>