filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application] <SYNC LIVE JAPAN INC.> - <Resubmit #123> #1095

Closed Sunkistn closed 1 year ago

Sunkistn commented 1 year ago

Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

SYNC LIVE JAPAN INC. is committed to making live entertainment business more creative, safer and more efficient performances.
In regards to the entertainment conducted in a variety of spaces, including outdoor events and amusement parks, and also in regard to sound design for cutting-edge VR or MR technology development, it is now becoming vital to not only perform sound adjustment using a DOLBY ATMOS base in order to create music that fills the entire available space, but also to link the music to video and the overall performance in complex and intricate ways in order to create a new form of entertainment.
We use AVID ProTools as our main DAW, and for sound design we mainly use Native Instruments. Our fundamental philosophy is to respond on the spot, quickly and accurately, to produce a final product perfectly in keeping with our client's desires, be that during a meeting or during rehearsals.

What is the primary source of funding for this project?

Company revenue

What other projects/ecosystem stakeholders is this project associated with?

No other projects/ecosystem stakeholders

Use-case details

Describe the data being stored onto Filecoin

Million Song Dataset: A freely-available collection of audio features and metadata for a million contemporary popular music tracks.
Free Music Archive: Founded in 2009 by radio station WFMU, offers free access to open licensed, original music. Many curators, netlabels and independent musicians around the world contributed to FMA's success. 
MusicNet Dataset: A curated collection of labeled classical music.

Where was the data in this dataset sourced from?

http://millionsongdataset.com/
https://github.com/mdeff/fma
https://freemusicarchive.org/
https://www.kaggle.com/imsparsh/musicnet-dataset

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this.

https://freemusicarchive.org/music/fma/music-insiders-by-free-music-archive-1/music-insiders-episode-2-simon-mathewson/
https://www.kaggle.com/datasets/imsparsh/musicnet-dataset
https://freemusicarchive.org/music/Derek_Clegg/life-unfolds-10-year-anniversary-re-issue/blind/

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

Yes, it is.

What is the expected retrieval frequency for this data?

Maybe every day

For how long do you plan to keep this dataset stored on Filecoin?

More than 2 years.

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

In Asia

How will you be distributing your data to storage providers? Is there an offline data transfer process?

The storage providers will download the data via the network.

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

Anyone interested in this project and the music industry.

How will you be distributing deals across storage providers?

According to the capacity of the storage provider, the distribution is based on the principle of equal distribution.

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

Have enough funds
large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

raghavrmadya commented 1 year ago

Please provide relevant samples and exactly what you will be stored with 5 PiBs of DC. All information must be provided upfront and cannot be provided "later"

Sunkistn commented 1 year ago

Please provide relevant samples and exactly what you will be stored with 5 PiBs of DC. All information must be provided upfront and cannot be provided "later"

http://millionsongdataset.com/ https://github.com/mdeff/fma https://freemusicarchive.org/ https://www.kaggle.com/imsparsh/musicnet-dataset

’Later‘ means that new data will be added if the business grows in the future. So far, the above are all data sources

Sunkistn commented 1 year ago

@raghavrmadya The previous application has passed the review and has reached the third round of datacap allocation, just because the github account was flagged and resubmitted with a new account

Sunkistn commented 1 year ago

@galen-mcandrew @Kevin-FF-USA @raghavrmadya This is the previous application information: sync sync2 sync3

raghavrmadya commented 1 year ago

Why are you requesting 5 PiBs when you have already received 200TiB?

raghavrmadya commented 1 year ago

"’Later‘ means that new data will be added if the business grows in the future." This is not acceptable. You can only apply for the amount of DC you need today, not for future projections

raghavrmadya commented 1 year ago

Who are the SPs you are working with currently? Please share their SP IDs and request them to confirm by commenting on this application

large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

Sunkistn commented 1 year ago

Who are the SPs you are working with currently? Please share their SP IDs and request them to confirm by commenting on this application @raghavrmadya I have updated the application form. SPs information can be queried here: https://filplus.info/allocation_record?client_address=f1ju5oabz45ceog6e7k5omdj56uspv2pzgghiyzdy&obj=eyJuYW1lIjoiU1lOQyBMSVZFIEpBUEFOIElOQy4iLCJpc3N1ZV9udW1iZXIiOiIxMjMifQ%3D%3D I have asked them to comment here, but not all of them have Github accounts.

large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

Sunnyiscoming commented 1 year ago

If you only source your data from the above four sites, it is difficult to prove that you have 5 PB data storage needs. Hope you can give more explanation.

Sunnyiscoming commented 1 year ago

Any update here?

Sunkistn commented 1 year ago

If you only source your data from the above four sites, it is difficult to prove that you have 5 PB data storage needs. Hope you can give more explanation.

@Sunnyiscoming We also have the following data sources: https://registry.opendata.aws/pacific-sound/, 140TiB https://registry.opendata.aws/elp-nouabale-landscape/, 60TiB And we plan to store 10 copies. sorry for the late reply.

large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

simonkim0515 commented 1 year ago

Datacap Request Trigger

Total DataCap requested

4.7PiB

Expected weekly DataCap usage rate

100TiB

Client address

f1ju5oabz45ceog6e7k5omdj56uspv2pzgghiyzdy

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1ju5oabz45ceog6e7k5omdj56uspv2pzgghiyzdy

DataCap allocation requested

50TiB

Id

6c2ffb09-cdae-4488-908a-30ad4eb8ee00

filplus-checker commented 1 year ago

DataCap and CID Checker Report[^1]

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

⚠️ 46.17% of total deal sealed by f01114587 are duplicate data.

⚠️ 37.50% of total deal sealed by f0867300 are duplicate data.

⚠️ 37.50% of total deal sealed by f0522948 are duplicate data.

⚠️ 37.50% of total deal sealed by f01227975 are duplicate data.

⚠️ 37.50% of total deal sealed by f01228000 are duplicate data.

⚠️ 37.50% of total deal sealed by f01228008 are duplicate data.

⚠️ 45.04% of total deal sealed by f0694908 are duplicate data.

⚠️ f0694908 has unknown IP location.

⚠️ f01075159 has unknown IP location.

⚠️ f0867429 has unknown IP location.

⚠️ f01016239 has unknown IP location.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01114587new Tokyo, Tokyo, JP 57.97 TiB 16.56% 31.21 TiB 46.17%
f0867300 Tokyo, Tokyo, JP 40.00 TiB 11.43% 25.00 TiB 37.50%
f0522948 Singapore, Singapore, SG 40.00 TiB 11.43% 25.00 TiB 37.50%
f01227975 Hong Kong, Central and Western, HK 40.00 TiB 11.43% 25.00 TiB 37.50%
f01228000 Seoul, Seoul, KR 40.00 TiB 11.43% 25.00 TiB 37.50%
f01228008 Sydney, New South Wales, AU 40.00 TiB 11.43% 25.00 TiB 37.50%
f0694908new Unknown 20.47 TiB 5.85% 11.25 TiB 45.04%
f01075159new Unknown 11.34 TiB 3.24% 11.34 TiB 0.00%
f0867429new Unknown 10.52 TiB 3.01% 9.90 TiB 5.94%
f01016255new Grimstad, Agder, NO 10.49 TiB 3.00% 9.90 TiB 5.66%
f01228009 Hong Kong, Central and Western, HK 10.00 TiB 2.86% 10.00 TiB 0.00%
f01228065 Singapore, Singapore, SG 10.00 TiB 2.86% 10.00 TiB 0.00%
f0867298new Dehiwala-Mount Lavinia, Western, LK 9.90 TiB 2.83% 9.90 TiB 0.00%
f01016239new Unknown 9.27 TiB 2.65% 9.27 TiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

✔️ Data replication looks healthy.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
31.30 TiB 46.82 TiB 1 13.38%
1.25 TiB 3.75 TiB 2 1.07%
640.00 GiB 2.44 TiB 3 0.70%
19.27 TiB 96.97 TiB 4 27.71%
25.00 TiB 200.00 TiB 5 57.15%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

⚠️ CID sharing has been observed.

Other Client Application Total Deals Affected Unique CIDs Verifier
f3wgfwtrs5p6jrkwfl2mksqa2ivgbgdjjrhjbefy3
n7qzvotc3y6sazmp5gfyj7um6jlgdvlbiepzawnc6
wxtq
FileDrive Labs 166.92 TiB 700 LDN v3 multisig
f1pkrmygbvweykpjcut36lf7ewgqdfhjklbhvepda Protocol Labs ( project: Slingshot Evergreen ) 400.00 GiB 13 LDN # 293

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

Sunnyiscoming commented 1 year ago

Hi, please explain the abnormal information. Some Sps has unknown IP location.

Sunkistn commented 1 year ago

Hi, please explain the abnormal information. Some Sps has unknown IP location.

These are SPs that I collaborated with around January 2022. However, my GitHub account was banned for a long time afterwards, which prevented me from obtaining datacap, resulting in the termination of our collaboration. There is no plan to continue working with them in the future.

Joss-Hua commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 7 storage providers sealed too much duplicate data - f01114587: 46.17%, f01227975: 37.50%, f01228000: 37.50%, f01228008: 37.50%, f0522948: 37.50%, f0867300: 37.50%, f0694908: 45.04%

⚠️ 2 storage providers have unknown IP location - f0694908, f01075159

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

Alex11801 commented 1 year ago

@Sunkistn

https://filplus.info/allocation_record?client_address=f1ju5oabz45ceog6e7k5omdj56uspv2pzgghiyzdy&obj=eyJuYW1lIjoiU1lOQyBMSVZFIEpBUEFOIElOQy4iLCJpc3N1ZV9udW1iZXIiOiIxMjMifQ%3D%3D

The page cannot be found. image

Sunkistn commented 1 year ago

@Sunkistn

https://filplus.info/allocation_record?client_address=f1ju5oabz45ceog6e7k5omdj56uspv2pzgghiyzdy&obj=eyJuYW1lIjoiU1lOQyBMSVZFIEpBUEFOIElOQy4iLCJpc3N1ZV9udW1iZXIiOiIxMjMifQ%3D%3D

The page cannot be found. image

@Alex11801 This link appears to expire periodically, please check for the latest link: https://filplus.info/allocation_record?client_address=f1ju5oabz45ceog6e7k5omdj56uspv2pzgghiyzdy&obj=%7B%22name%22%3A%22SYNC%20LIVE%20JAPAN%20INC.%22,%22issue_number%22%3A%22123%22%7D

zcfil commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 7 storage providers sealed too much duplicate data - f01114587: 46.17%, f01227975: 37.50%, f01228000: 37.50%, f01228008: 37.50%, f0522948: 37.50%, f0867300: 37.50%, f0694908: 45.04%

⚠️ 2 storage providers have unknown IP location - f0694908, f01075159

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

sgclouder commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

⚠️ All retrieval success ratios are below 1%.

Storage Provider Distribution

⚠️ 7 storage providers sealed too much duplicate data - f01114587: 46.17%, f01227975: 37.50%, f01228000: 37.50%, f01228008: 37.50%, f0522948: 37.50%, f0867300: 37.50%, f0694908: 45.04%

⚠️ 2 storage providers have unknown IP location - f0694908, f01075159

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 14 days, so for now it is being closed. Please feel free to contact the Fil+ Gov team to re-open the application if it is still being processed. Thank you!

large-datacap-requests[bot] commented 7 months ago

Thanks for your request! :exclamation: We have found some problems in the information provided. We could not find Website \/ Social Media field in the information provided We could not find Total amount of DataCap being requested (between 500 TiB and 5 PiB) field in the information provided We could not find Weekly allocation of DataCap requested (usually between 1-100TiB) field in the information provided We could not find On-chain address for first allocation field in the information provided We could not find Data Type of Application field in the information provided

Please, take a look at the request and edit the body of the issue providing all the required information.
large-datacap-requests[bot] commented 5 months ago

RootKeyHolders have approved multisig account. You can now request first datacap release

large-datacap-requests[bot] commented 2 months ago

RootKeyHolders have approved multisig account. You can now request first datacap release