PokemonGoers / PredictPokemon-2

In this project we will apply machine learning to establish the TLN (Time, Location and Name - that is where pokemons will appear, at what date and time, and which Pokemon will it be) prediction in Pokemon Go.
Apache License 2.0
9 stars 3 forks source link

Create data set to classify Pidgey #20

Closed bensLine closed 7 years ago

bensLine commented 8 years ago

According to the plots of the dummy and api data most of the sights in our data are from Pidgey. Therefore we want to modify the existing data set to classify if a sighting is from Pidgey or not.

You can use the dummy data for this classifier since it has about 600 entries whereas the apiData has about 2500. So if you want to save some time while building the classifier stick to the small data set ;)

  1. Create an .arff which contains
    • timestamp, latitude, longitude as attributes, all are numeric.
    • isPidgey as class label, which can either be true or false. The class label will be true if the data entry has the pokemonId == 16 otherwise it is false.
  2. Test if you can open the data set in Weka
  3. upload the file (PR)
  4. run Weka and see how the data set performs. try different approaches, as Tatyana mentioned in the last two points of her comment. The rest of us will also test the data set.

To create the .arff file just adapt one of the existing scripts we already have in the repo.

goldbergtatyana commented 8 years ago

hi there, the arff file will change many many times later. Now we need to concentrate on getting and incorporating as many features as possible into it. Then we will do feature selection (based on weka statistics we will throw some of the features out). Only afterwards it makes sense to train and test the model .. not at this moment now.

MatthiasBaur commented 7 years ago

We moved pass the binary classifiers.