data-preservation-programs / slingshot

Official public repository for feedback and data collection in Filecoin Slingshot
https://slingshot.filecoin.io
68 stars 250 forks source link

[dataset extension request] Smartcity #467

Closed NiwanDao closed 2 years ago

NiwanDao commented 3 years ago

You can request to continue uploading an incompletely onboarded dataset to Slingshot if it previously qualified for rewards but no longer does per the list of curated datasets for Slingshot. Please note that these requests will be reviewed on a case by case basis and approvals are only for specific project teams to continue onboarding the specific dataset.

Slingshot participation information

Dataset onboarding progress

orvn commented 3 years ago

@xingjitansuo: Are you asking for extension into phase 2.6 (which concluded recently) or phase 2.7?

NiwanDao commented 3 years ago

@orvn, for phase 2.7.

orvn commented 3 years ago

Thanks! cc @dkkapur to evaluate

NiwanDao commented 2 years ago

@dkkapur is there any update here ?

dkkapur commented 2 years ago

@xingjitansuo - approving this one.

@orvn can we enable this dataset for project team "XingjiTanSuo-smartcity"

orvn commented 2 years ago

@xingjitansuo is your project name XingjiTanSuo-smartcity? We couldn't find that, but did find: Smartcity- Sensor-based network and data analysis system? (link to project)

Is that the right one?

dkkapur commented 2 years ago

@xingjitansuo - just looked through the past conversations on this dataset and it looks like we did not get the full context on this dataset. Can you help me with:

dkkapur commented 2 years ago

Pending response to above questions before approving.

NiwanDao commented 2 years ago

@xingjitansuo is your project name XingjiTanSuo-smartcity? We couldn't find that, but did find: Smartcity- Sensor-based network and data analysis system? (link to project)

Is that the right one?

Correct

NiwanDao commented 2 years ago

@dkkapur @orvn

  • who is generating the dataset? Those datasets are generated by a research lab in University of Electronic Science and Technology of China.
  • who is paying for the data to be collected today? University Research Fund.
  • is it available for a free download somewhere on the web today? Yes, you could access from http://api.sr2.glm2m.com/index.php?r=smartcity-dataset%2Fdataset
  • what city or cities is the data being captured in? Most of the data are captured in Sichuan - China.
orvn commented 2 years ago

Thanks for the reply @xingjitansuo.

I had some more questions and concerns with issues I've been having with your app UI.

  1. Did something change with the app? Last week I was able to download files successfully (.wav files). However now I get .db files. Is it just a mime type issue or something else?

  2. I’m also getting some 503 when making requests occasionally
    Slingshot 2021-12-13 at 23 50 42

  3. In order to work for Slingshot, these datasets need to be easily accessible by other participants. Is there a way for a user to download the fully dataset from you app? (either as one download or just a few chunks?)

NiwanDao commented 2 years ago

That is a good call out. The server was down and it has now been fixed. Please check one more time. Those datasets are open to public and is accessible by clicking download button shown in the website. @orvn

orvn commented 2 years ago

@xingjitansuo: it seems to mostly work now, two issues I'm finding:

db

dataset-download

NiwanDao commented 2 years ago

@orvn,

  1. 数据集下载 (Dataset Download) returns the static page where you can find all the wav files. This page does not support batch download. If one is interested in specific file, could simply hit the download button near the wav file.
  2. We expect this db file format error to happen occasionally. Since this dataset is normally used within a small group, it is not worth to check every file format.
orvn commented 2 years ago

This page does not support batch download

Does this mean that other participants can't download the full dataset (without running a script that scrapes your app UI)? @xingjitansuo

Because that would be a blocker to other participants using this dataset, since Slingshot users will normally try to fill 32 GiB sectors.

NiwanDao commented 2 years ago

@orvn I talked to the team, and they provided the batch download features specifically for onboarding to Filecoin.

orvn commented 2 years ago

I tested the download from the list and it works, but the list needs some slight modification.

All files have an extra / character in the URI constructor. Please fix this @xingjitansuo.

I ran a test on all 24k of your records when it's removed and they do download successfully.

...
http://117.175.0.137/nfs/179//B3/data2/883.wav
http://117.175.0.137/nfs/179//B3/data2/898.wav
http://117.175.0.137/nfs/179//B3/data2/949.wav
http://117.175.0.137/nfs/179//B3/data2/932.wav
...

To make it easier for other Slingshot users, I also think there should be a command that helps them download all files.. @dkkapur, does that make sense to you?

A simple *nix-friendly version with no dependencies would look something like:

curl -s http://117.175.0.137/down.downlist\?id\=1 | xargs -L1 -I {} curl -O {}
NiwanDao commented 2 years ago

@orvn Please check again. :)

orvn commented 2 years ago

The URLs in the download list are fixed. @dkkapur to review.

dkkapur commented 2 years ago

@xingjitansuo - let's proceed with approving this for current and future phases for now. Thanks for your hard work in making it available easily to others as well!

NiwanDao commented 2 years ago

Thanks for all your effort ! @dkkapur @orvn

orvn commented 2 years ago

@xingjitansuo, you will find that your project is able to select the Smart City dataset.

@dkkapur, for now, I added it to the temporarily disallowed dataset, but if needed, you can move it up as a dataset available for everyone on the next phase at your discretion.

orvn commented 2 years ago

@xingjitansuo, the app URL appears to be down? Is this just temporary?

NiwanDao commented 2 years ago

@orvn I synced up with the team, and it should be alive next week.

orvn commented 2 years ago

Sounds good, thanks!