BelalC / sign2text

Real-time AI-powered translation of American Sign Language fingerspelling (sign to text)
155 stars 48 forks source link

Request for the training set #5

Closed joohaeng closed 4 years ago

joohaeng commented 6 years ago

@BelalC, I tried to find some info in the BU site, but it was not too easy for me to get any. Do you have any plan to share the training set used in this project? Thank you again for the sharing the nice project.

BelalC commented 6 years ago

It's a little tricky to get all the data from the BU website (but definitely possible). And yes, I am planning on sharing the training set I used for the project and hopefully further expanding the dataset. Updates coming in the next few weeks

On 3 Apr 2018, at 22:13, Joo-Haeng Lee notifications@github.com wrote:

I tried to find some info in the BU site, but it was not too easy for me to get any. Do you any plan to share the training set used in this project?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/BelalC/sign2text/issues/5, or mute the thread https://github.com/notifications/unsubscribe-auth/APdrqCJUtNPuUqFmB1rVi-Jgojno5dVUks5tk-XcgaJpZM4TFzAO.

csyhping commented 6 years ago

@BelalC ,hi, are u still working on this project? Could you please share how to get the training set? Thanks again for your nice work.

Purinsuu commented 6 years ago

Training set please. :(

BelalC commented 6 years ago

hi guys - yes the project is still active(-ish).

For my training, I combined the Massey University ASL dataset (~2524 images) with data I generated from colleagues/myself - only 321 images (approximately 12 images per class). The Massey University dataset is already public, so probably no point in sharing that with you guys. I'll update the ReadMe with links to that dataset + a few others I've come across which you can download directly.

I'm happy to share the data (321 images) I generated myself. Currently, looking into sharing it as an open-source dataset on Kaggle or Amazon. Any suggestions on the best way to share? It's on a hard drive and I'd prefer not to pay to host the data online indefinitely! let me know your thoughts please.

Purinsuu commented 6 years ago

How about google drive?

On Oct 12, 2018 8:41 AM, "Belal Chaudhary" notifications@github.com wrote:

hi guys - yes the project is still active(-ish).

For my training, I combined the Massey University ASL dataset (~2524 images) with data I generated from colleagues/myself - only 321 images (approximately 12 images per class). The Massey University dataset is already public, so probably no point in sharing that with you guys. I'll update the ReadMe with links to that dataset + a few others I've come across which you can download directly.

I'm happy to share the data (321 images) I generated myself. Currently, looking into sharing it as an open-source dataset on Kaggle or Amazon. Any suggestions on the best way to share? It's on a hard drive and I'd prefer not to pay to host the data online indefinitely! let me know your thoughts please.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/BelalC/sign2text/issues/5#issuecomment-429166778, or mute the thread https://github.com/notifications/unsubscribe-auth/AlIZKxSEMfOrAgrM9wyXclqha8mzUlzgks5uj-VJgaJpZM4TFzAO .

-- “This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. 

--

E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required please request a hard-copy version.” Bicol University, Legazpi City, Philippines.

BelalC commented 4 years ago

hi all - please find a Google Drive link below to access data I generated myself and part 5 of the Massey University dataset (the other 4 parts are available online via their website https://www.massey.ac.nz/~albarcza/gesture_dataset2012.html

Google Drive link - https://drive.google.com/drive/folders/1-t8rgN3eOW99KGDy7U0HJhrbbwOe-5Wh?usp=sharing

All the data is already split into train/validation subsets, and labelled with letters from A-Z. NOTE - the Massey dataset I've included is already pre-processed. I added padding due to odd shaped images, and also dropped a colour channel as there was a lot of green screen background in the images. Dropping the colour channel didn't cause any significant changes in performance so I've left it in. You can get the raw data directly from Massey University.

Hope this helps :)