Build a Deep Learning model to predict geolocation of tweets

PolinaPanicheva commented 2 years ago

Context

The goal is to create and train a deep learning model which predicts coordinates (latitude, longitude) of individual tweets. You are free to use any approach, but we have a few suggestions. Our current idea is to use a simple Character CNN architecture, that would capture the most prominent character sequences related to location-specific language variety and probably the most common location names. We suggest avoiding using complex linguistic features and structures in this model, specifically, Named Entity Recognition and Linking. Please apply with a rough overview of your model architecture. No hard MSE or EER requirements - we are after a scalable model architecture that will allow us to increase the training dataset size later on.

Development dataset

We have 4M tweets from 3,361 locations covering the South America, written in 2021. Each .csv file is named with the coordinates _(latitudelongitude) and contains the text of the tweet (column text) and some meta-information.

Deliverable

A model which takes a tweet text as input and returns the coordinates as output; the model evaluation metrics obtained on the development dataset, including Mean Absolute Error in kilometers. We will evaluate the model using the test dataset that is not shared here.

Resources

read this article for inspiration.
request access to and download the development dataset;
message us at challenge@inca.digital with an overview of your model architecture, the obtained evaluation metrics, and your preferred payment method.
we will contact all applicants with working models, evaluate the model on a held-out test dataset, and schedule a 30-min interview to discuss further steps.
don't hesitate to ask us questions by commenting in this issue or emailing us at challenge@inca.digital.

Successful submissions

🎉 @Lavriz successfully solved the challenge and was hired by Inca Digital.

Ta-nu-ki commented 2 years ago

I wonder if using the GazPNE2 approach can be successful for this task. In a way of linking the tweet and the place name with the corresponding coordinates. https://www.researchgate.net/publication/355711839_GazPNE2_A_general_place_name_extractor_for_tweets_fusing_gazetteers_deep_learning_and_transformer_models https://github.com/uhuohuy/GazPNE2

AsSugar13 commented 2 years ago

Em, is it still actual? or I shouldn't wait for a feedback?

asyaisakova commented 2 years ago

Em, is it still actual? or I shouldn't wait for a feedback?

Yes, all of our open challenges are relevant and available to complete. We contact everyone, who has successfully completed the challenge, regarding their further steps with us.

ildkhav commented 2 years ago

hi, check it out https://github.com/ildkhav/a-Deep-Learning-model-to-predict-geolocation-of-tweets.git

StopTestingRightNow commented 2 years ago

Here is my version https://github.com/StopTestingRightNow/Tweets_Geolocation

alinapark commented 2 years ago

@Lavriz successfully solved the challenge and was hired by Inca Digital.

1712n / challenge