SkyWa7ch3r / ImageSegmentation

This project is a part of the Pawsey Summer Internship where I will do test multiple semantic segmentation algorithms and models on their training and inference time. There will also (given time) be experimentation with Panoptic Segmentation which combines semantic and instance segmentation together.
7 stars 4 forks source link

help on trainig #1

Closed johnSmith1990 closed 4 years ago

johnSmith1990 commented 4 years ago

Hi dear Thanks for your great project. I wanna to train the network with cityscape dataset but got this error:

    import cityscapesscripts.helpers.labels as labels
ModuleNotFoundError: No module named 'cityscapesscripts'
SkyWa7ch3r commented 4 years ago

Hi There!

Thank you!

It appears you don't have cityscapesscripts installed, it can be installed with pip.

python -m pip install cityscapesscripts

The GitHub repo for the cityscapesscripts is below with more instructions on using the dataset and understanding its directory structure and API.

https://github.com/mcordts/cityscapesScripts

Thanks Joel

johnSmith1990 commented 4 years ago

thanks dear Joel. I am going to customize your project. It uses many networks but i want to use fast-scnn not using horovod. regards, John

SkyWa7ch3r commented 4 years ago

No worries, good luck.

Although I will say, if you are doing Multi-GPU and you're doing HPC, Horovod is an excellent solution as it can utilise the resources provided by Job Schedulers like Slurm rather well.

If you don't have any further questions, let me know and I'll close the ticket.

Thanks Joel

johnSmith1990 commented 4 years ago

thanks. May you share a code for training fast-scnn on single gpu? There are some sample code of model but they dont have script for train and test.

regards, John

SkyWa7ch3r commented 4 years ago

Hi John

This repo is my code for a single GPU or more, for training (train.py) and testing (predict.py).

There are plenty of code examples of training models via cityscapes dataset via different models all over the web through PyTorch and TensorFlow (some I have even referenced in my own repo).

Please research and refer to those, as well as my own repo to create and train your model on cityscapes.

Thank you Joel

johnSmith1990 commented 4 years ago

horovod doesnt install sucsessfully. It gives this error :

ERROR: Command errored out with exit status 1: /home/amin/anaconda3/envs/tensor2_env/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-3tr9w6at/horovod/setup.py'"'"'; __file__='"'"'/tmp/pip-install-3tr9w6at/horovod/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-mrjkw7mu/install-record.txt --single-version-externally-managed --compile --install-headers /home/amin/anaconda3/envs/tensor2_env/include/python3.7m/horovod Check the logs for full command output.

johnSmith1990 commented 4 years ago

Cany you share pretrain model to test FPS of the model? I want to train model on my custom dataset.

johnSmith1990 commented 4 years ago

What is the values of hvd.rank and hvd.size for single gpu? I cant install horovod.

SkyWa7ch3r commented 4 years ago

Cany you share pretrain model to test FPS of the model? I want to train model on my custom dataset.

Please look at my repo completely before asking questions, I have weights for the training inside my results directory.

horovod doesnt install sucsessfully. It gives this error :

ERROR: Command errored out with exit status 1: /home/amin/anaconda3/envs/tensor2_env/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-3tr9w6at/horovod/setup.py'"'"'; __file__='"'"'/tmp/pip-install-3tr9w6at/horovod/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-mrjkw7mu/install-record.txt --single-version-externally-managed --compile --install-headers /home/amin/anaconda3/envs/tensor2_env/include/python3.7m/horovod Check the logs for full command output.

I won't be able to help you install horovod as it depends on what your needs are, I installed MPI for my configuration but you may not need that, please refer to here: https://horovod.readthedocs.io/en/stable/install_include.html and here: https://github.com/horovod/horovod You may need to ensure a compatible g++ compiler is installed.

What is the values of hvd.rank and hvd.size for single gpu? I cant install horovod.

Please refer to the API documentation, that is why they exist https://horovod.readthedocs.io/en/stable/api.html

hvd.size() should be 1, hvd.rank depends on the process (but for a single GPU will likely only produce 0)

If you do your code right...it shouldn't matter what the size is, your training regime should work no matter the size given by hvd.size() (except for being verbose with epochs and output into the console, that should be done by a single rank).

I am a very busy person (multiple projects), I unfortunately do not have the time to solve every problem you have with understanding everything, please refer to Horovod and TensorFlow documentation to understand functions and to solve installation issues.

Please note that all of these packages have likely updated and you will likely need to adjust my code to make it compatible with whatever versions you may or may not have.

If you have any more questions, directly related to my repo, then please ask, but I cannot help you with questions figure out your own code and installation issues that you have.

Thank you Joel

johnSmith1990 commented 4 years ago

thank you. If you cant help anyone using your repo, why you share your code? for time wasting? please close issue and delete your repo to don't waste others time anymore.

regards.

SkyWa7ch3r commented 4 years ago

thank you. If you cant help anyone using your repo, why you share your code? for time wasting? please close issue and delete your repo to don't waste others time anymore.

regards.

I share it so people can use the code, and to use my training code, you are not the only one who used the code and asked me questions. I have helped make tutorials from my code in an organization, through feedback of using my code in their system.

Please do not accuse me of wasting people's time, that is very rude, and unappreciated. You have asked questions that are beyond the scope of GitHub issues which are reserved for bugs and feature requests, or changes needed to the repo, or issues in using my code rather than providing full answers on how to install packages and answers that can be found within the API or documentation of said packages you are installing.

Please respect your colleagues time and efforts, by also going through the same toil as I did by reading and researching the packages you are using before using them.

I have 3 separate projects and a day job at this current time, and I have taken what little time I have of my personal time to help you, and you have responded to me rudely.

I will not be deleting my repo as others are finding it useful and have forked it, and to others who are reading this, I want to ensure, that I take every issue seriously.

If you make a post with code that includes my own and have issues with running it then I will help you in anyway that I can.

Kind Regards Joel