GlobalFishingWatch / training-data

Training data detecting fishing behaviour in vessel movement patterns
38 stars 22 forks source link

Issues with loading data or Issues with the data #4

Open philipjhj opened 6 years ago

philipjhj commented 6 years ago

Either I have somehow pulled the data incorrectly, or there is an issue with the GPS location data.

After loading the data with the following code:

paths_raw=[]
for file in file_names:
    track = np.load(file)
    points=[(x[-1],x[-2]) for x in track['x']] # (longtitude,latitude)
    points=np.asarray(points)
    paths_raw.append(points)

where file_names are paths to these files

GFW_training_data/data/tracks/33272339789416.npz
GFW_training_data/data/tracks/278417339378522.npz
GFW_training_data/data/tracks/43454304520674.npz
GFW_training_data/data/tracks/238534759947303.npz
GFW_training_data/data/tracks/54820430995875.npz
GFW_training_data/data/tracks/71554660259352.npz
GFW_training_data/data/tracks/232400805249883.npz
GFW_training_data/data/tracks/148554964687863.npz
GFW_training_data/data/tracks/280574352919673.npz
GFW_training_data/data/tracks/187372144064677.npz

The data with basemap looks like bad_gps_data

Not all contains "jumping" coordinates, but e.g. GFW_training_data/data/tracks/71554660259352.npz looks like bad_gps_data2

How do you suggest to clean this data?

Would you share your code for doing this, or update the data to a clean state?

sakalu commented 4 years ago

Either I have somehow pulled the data incorrectly, or there is an issue with the GPS location data. After loading the data with the following code: paths_raw=[] for file in file_names: track = np.load(file) points=[(x[-1],x[-2]) for x in track['x']] # (longtitude,latitude) points=np.asarray(points) paths_raw.append(points) where file_names are paths to these files GFW_training_data/data/tracks/33272339789416.npz GFW_training_data/data/tracks/278417339378522.npz GFW_training_data/data/tracks/43454304520674.npz GFW_training_data/data/tracks/238534759947303.npz GFW_training_data/data/tracks/54820430995875.npz GFW_training_data/data/tracks/71554660259352.npz GFW_training_data/data/tracks/232400805249883.npz GFW_training_data/data/tracks/148554964687863.npz GFW_training_data/data/tracks/280574352919673.npz GFW_training_data/data/tracks/187372144064677.npz

The data with basemap looks like

Not all contains "jumping" coordinates, but e.g. GFW_training_data/data/tracks/71554660259352.npz looks like

How do you suggest to clean this data? Would you share your code for doing this, or update the data to a clean state?

@philipjhj

Hi philipjhj,

I would like to use this data for my project, so I am wondering if you have find the solution to your problem?

Saka

philipjhj commented 4 years ago

Hi Saka,

I believe we did not manage to solve the issue but simply cleaned the bad paths to the best of our abilities.

Unfortunately, it will be difficult for me to look into what we did exactly currently.

Bests, Philip

sakalu commented 4 years ago

manage to @philipjhj

Hi Philip,

Thank you very much for your prompt reply.

I would like to take this opportunity to ask if you have ever successfully generated the folder "/merged" with 6 files (alex_crowd_sourced.npz, false_positives.npz, kristina_longliner.npz, kristina_ps.npz, kristina_trawl.npz, pybossa_project_3.npz) according to the instruction ( run ./prepare.sh) in the following link?

"https://github.com/GlobalFishingWatch/training-data"

The reason why I ask is that all *.npz files I generated from my Window system but the results are incorrect as ALL labelled with "-1" for 'is_fishing' but no "1" for 'fishing' found.

Regards, Saka