Open anuradhakar49 opened 3 years ago
Can you please provide the log files showing what the exact error is?
Unfortunately this information is too high-level for us to provide any insights
On Sun, Oct 17, 2021 at 7:46 PM Anuradha Kar @.***> wrote:
Hi, The installation of the tool runs smoothly as described in the Github repository but I am encountering problems with retraining the deep learning model. For example, after adding 2 pairs of images in a new project, making patches and annotations and uploading them as training and test images, if we click "Retrain model" on the Project page, I am getting the ERROR: train_autoencoder (job N) failed. On the Annotations page, clicking the "Retrain DL" button displays an HTML error.
Please provide suggestions on how to resolve these errors. Anuradha Kar
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/choosehappy/QuickAnnotator/issues/13, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJ3XTFJJTOXKGPSG63EPU3UHMDWXANCNFSM5GFBYJCA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Hi @anuradhakar49 and @choosehappy
could you solve the issue? I'm having the same problem:
2021-11-25 13:54:10,872 [INFO] (THREAD 18304) About to train a new transfer model for try2 2021-11-25 13:54:10,887 INFO sqlalchemy.engine.base.Engine ROLLBACK 2021-11-25 13:54:10,887 [INFO] (THREAD 18304) ROLLBACK 2021-11-25 13:54:10,888 [INFO] (THREAD 18304) 127.0.0.1 - - [25/Nov/2021 13:54:10] "GET /api/try2/retrain_dl?frommodelid=0 HTTP/1.1" 404 -
System: Win10, python 3.8, cuda 10.2
Best regards, Mario
Sorry to hear this Mario!
Is this information you're putting here from the command line itself, or is it coming from the log file?
If you can send over the entire associated log file that would be appreciated
In the end, we were able to fix anuradhakar49's problem, it was environmental. if I remember correctly it was an incompatible cuda driver + cuda version? @tasvora may have additional info
Yes it was environment issue related to cuda, but did not get to look at it in detail as Anuradha decided to use Linux and it worked fine there.
Regards Tasneem On Thu, Nov 25, 2021 at 10:28 AM choosehappy @.***> wrote:
Sorry to hear this Mario!
Is this information you're putting here from the command line itself, or is it coming from the log file?
If you can send over the entire associated log file that would be appreciated
In the end, we were able to fix anuradhakar49's problem, it was environmental. if I remember correctly it was an incompatible cuda driver + cuda version? @tasvora https://github.com/tasvora may have additional info
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/choosehappy/QuickAnnotator/issues/13#issuecomment-979303168, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMTB5DQ57VHROZ2KYZIFHXLUNZIZFANCNFSM5GFBYJCA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Yes this issue is solved and was linked to cuda +torch versions. @mariokreutzfeldt Please check if you have a cuda compatible GPU and that your code is being able to access the GPU (i.e the GPU is not busy with another task) . Also make sure the pytorch version is compatible with cuda 10.2 (https://pytorch.org/get-started/previous-versions/) Else try a reinstall with torch CPU only version to test.
Dear all, thank you for your fast replies!!
I have verified the CUDA installation via nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Wed_Oct_23_19:32:27_Pacific_Daylight_Time_2019 Cuda compilation tools, release 10.2, V10.2.89
and pytorch installation via torch.cuda.is_available()
true
During installation of QA I ran into many unresolvable version issues. So I ended up installing the following.
numpy==1.17.3 Flask_SQLAlchemy==2.4.0 scikit_image==0.16.2 scikit_learn==0.24.0 opencv_python_headless==4.1.2.30 scipy==1.4.1 requests==2.22.0 SQLAlchemy==1.3.5 tensorboard==2.4.1 ttach==0.0.2 albumentations==0.4.3 config==0.4.2 Flask==1.0.3 Pillow==8.1.2 llvmlite==0.34.0 numba umap-learn Flask_Restless==0.17.0 python-openslide==1.1.2
For Pytorch I had the automatic installation already fail for another project, so I downloaded the packages manually. torch 1.8.1+cu102 torchaudio 0.10.0+cu102 torchvision 0.9.1+cu102
I installed torch first. When I installed torchaudio and torchvision it would deinstall torch and replace it with a non-cuda version. So I installed torch+cu102 again after having installed torchaudio and torchvision.
@choosehappy, the complete log is here
Best regards, Mario
Quick additional info: replacing the CUDA with CPU versions of pytorch did not solve it. Still getting ERROR 404.
it does like this environment is really going to be the issue. those libraries have been tested to work together and is what is used to create e.g., our docker files
unfortunately this log file doesn't appear to contain anything interesting. can you as well upload all data.* files? there might be up to 3 of them:
data.db, data.db-shm, data.db-wal
On Fri, Nov 26, 2021 at 2:21 PM mariokreutzfeldt @.***> wrote:
Quick additional info: replacing the CUDA with CPU versions of pytorch did not solve it. Still getting ERROR 404.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/choosehappy/QuickAnnotator/issues/13#issuecomment-979974320, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJ3XTFPIFJZHJBYPDIYKLDUN6CVFANCNFSM5GFBYJCA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
@choosehappy here you go.
Doesn`t contain data.db-wal because the file was 0kb.
Okay, this database looks like it was cleaned out
It looks like you restarted quick annotator after you had the error, which by default goes through and clears out old jobs
Can you set this line:
to False
reproduce your error and send back over?
On Fri, Nov 26, 2021 at 4:11 PM mariokreutzfeldt @.***> wrote:
@choosehappy https://github.com/choosehappy here you go https://www.dropbox.com/t/wwWRuHA61zpwkpTn.
Doesn`t contain data.db-wal because the file was 0kb.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/choosehappy/QuickAnnotator/issues/13#issuecomment-980048964, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJ3XTEGSAAY67PI26OQC6DUN6PTZANCNFSM5GFBYJCA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Also in addition to that.
If you could copy everything that you see on your console where u initiating the quick annotator application from and save it as a text file and send that too would help too, may be there is a specific library error we might be missing.
Regards Tasneem
On Fri, Nov 26, 2021 at 10:49 AM choosehappy @.***> wrote:
Okay, this database looks like it was cleaned out
It looks like you restarted quick annotator after you had the error, which by default goes through and clears out old jobs
Can you set this line:
to False
reproduce your error and send back over?
On Fri, Nov 26, 2021 at 4:11 PM mariokreutzfeldt @.***> wrote:
@choosehappy https://github.com/choosehappy here you go https://www.dropbox.com/t/wwWRuHA61zpwkpTn.
Doesn`t contain data.db-wal because the file was 0kb.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/choosehappy/QuickAnnotator/issues/13#issuecomment-980048964 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/ACJ3XTEGSAAY67PI26OQC6DUN6PTZANCNFSM5GFBYJCA
. Triage notifications on the go with GitHub Mobile for iOS < https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675
—
You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/choosehappy/QuickAnnotator/issues/13#issuecomment-980069979, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMTB5DSRLZCQ6UKRWSK4ELLUN6T7XANCNFSM5GFBYJCA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Here are the log files and the data.db after changing the config. I am using the CPU version of pytorch now and have seen that one project is giving me a "not enough training/test images"..which makes sense. The second project is still giving error 400.
hmm...i think we'll have to jump on a call, these log files and database seem to indicate that things are working as expected : )
Thank you @choosehappy and @tasvora for helping solve this issue! In case someone else is having this problem, it turned out that I had a broken svml_dispmd.dll (730kb instead of 18MB). Also, make sure scikit-image==0.18.1 is installed.
Best regards, Mario
Hi @choosehappy and @mariokreutzfeldt, I have the same problem about Retrain DL in Quickannotator. After annotating a patch, when I ran Retrain DL -From base, I got error message like "ERROR 404: (Unknown error)". The shotcut is as below . The console log is like "2022-06-09 08:49:11,130 INFO sqlalchemy.engine.base.Engine BEGIN (implicit) 2022-06-09 08:49:11,130 [INFO] (THREAD 139621868058368) BEGIN (implicit) 2022-06-09 08:49:11,131 INFO sqlalchemy.engine.base.Engine SELECT project.id AS project_id, project.name AS project_name, project.description AS project_description, project.date AS project_date, project.train_ae_time AS project_train_ae_time, project.make_patches_time AS project_make_patches_time, project.iteration AS project_iteration, project.embed_iteration AS project_embed_iteration FROM project WHERE project.name = ? LIMIT ? OFFSET ? 2022-06-09 08:49:11,131 [INFO] (THREAD 139621868058368) SELECT project.id AS project_id, project.name AS project_name, project.description AS project_description, project.date AS project_date, project.train_ae_time AS project_train_ae_time, project.make_patches_time AS project_make_patches_time, project.iteration AS project_iteration, project.embed_iteration AS project_embed_iteration FROM project WHERE project.name = ? LIMIT ? OFFSET ? 2022-06-09 08:49:11,131 INFO sqlalchemy.engine.base.Engine ('test1', 1, 0) 2022-06-09 08:49:11,131 [INFO] (THREAD 139621868058368) ('test1', 1, 0) 2022-06-09 08:49:11,131 [INFO] (THREAD 139621868058368) About to train a new transfer model for test1 2022-06-09 08:49:11,131 [INFO] (THREAD 139621868058368) About to train a new transfer model for test1 2022-06-09 08:49:11,132 INFO sqlalchemy.engine.base.Engine ROLLBACK 2022-06-09 08:49:11,132 [INFO] (THREAD 139621868058368) ROLLBACK 2022-06-09 08:49:11,132 [INFO] (THREAD 139621868058368) 124.126.17.86 - - [09/Jun/2022 08:49:11] "GET /api/test1/retrain_dl?frommodelid=0 HTTP/1.1" 404 -" According to your previous talk recordings, I checked my cuda version and pytorch version, which is compatible. pytorch installation via torch.cuda.is_available() true. Hoping I could get help about this issue. Best regards, Xiaoping
we can start by collecting more information: 1) operating system + version 2) python version 3) pip freeze output 4) cuda version 5) Nvidia GPU version
Sure.
hmmm!! this all looks very reasonable!
is there any additional information in the console window at the top of the screen on the right?
In looking at the API itself and the console information you provided, the only 404 message that seems reasonable is here:
This would seem to suggest that you don't have a base model already trained? is that the case?
if you look here:
https://github.com/choosehappy/QuickAnnotator/wiki/Image-List-Page
did you use the "3. (re)train model 0" button?
this step is needed to give good default weights
Thanks @choosehappy . I didn't use "3.(re)train model 0 "button before. When I use "3.(re)train model 0" button, I got error message in console, which is like " TypeError: Descriptors cannot not be created directly. If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0. If you cannot immediately regenerate your protos, some other possible workarounds are:
Fantastic! so you're all set?
did you encounter this problem when using the provided docker file, or you were using in your own base operating system?
On Tue, Jun 14, 2022 at 12:31 PM stellaqu123 @.***> wrote:
Thanks @choosehappy https://github.com/choosehappy . I didn't use "3.(re)train model 0 "button before. When I use "3.(re)train model 0" button, I got error message in console, which is like " TypeError: Descriptors cannot not be created directly. If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0. If you cannot immediately regenerate your protos, some other possible workarounds are:
- Downgrade the protobuf package to 3.20.x or lower.
- Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower). ". After downgrade protobuf package to 3.19.1, “3 (re)train model 0” and Retrain DL function work. The problem is solved. Thanks for your help! 👍
— Reply to this email directly, view it on GitHub https://github.com/choosehappy/QuickAnnotator/issues/13#issuecomment-1155006261, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJ3XTGYXEQZ62ZCBQ7SRT3VPBNOHANCNFSM5GFBYJCA . You are receiving this because you were mentioned.Message ID: @.***>
yes. I could use Quickannotator Retrain DL function. I did't use docker. I just installed this package in my operating system.
Got it, thanks
Yes, protobuf can be a tricky one to maintain at the os level :)
On Thu, Jun 23, 2022, 11:21 stellaqu123 @.***> wrote:
yes. I could use Quickannotator Retrain DL function. I did't use docker. I just installed this package in my operating system.
— Reply to this email directly, view it on GitHub https://github.com/choosehappy/QuickAnnotator/issues/13#issuecomment-1164165931, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJ3XTC6LEVU4RPFDE7QCDDVQQUARANCNFSM5GFBYJCA . You are receiving this because you were mentioned.Message ID: @.***>
Hi, The installation of the tool runs smoothly as described in the Github repository but I am encountering problems with retraining the deep learning model. For example, after adding 2 pairs of images in a new project, making patches and annotations and uploading them as training and test images, if we click "Retrain model" on the Project page, I am getting the ERROR: train_autoencoder (job N) failed. On the Annotations page, clicking the "Retrain DL" button displays an HTML error.
Please provide suggestions on how to resolve these errors. Anuradha Kar