filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
109 stars 62 forks source link

[DataCap Application] <LaughStorage> - < Sloan Digital Sky Survey> #2128

Closed 26dos closed 1 year ago

26dos commented 1 year ago

Data Owner Name

LaughStorage

What is your role related to the dataset

Data Preparer

Data Owner Country/Region

China

Data Owner Industry

Not-for-Profit

Website

https://www.sdss.org/

Social Media

NA

Total amount of DataCap being requested

12PiB

Expected size of single dataset (one copy)

1.2p

Number of replicas to store

10

Weekly allocation of DataCap requested

1PiB

On-chain address for first allocation

f1gk53djusmlrwr2extftafi4m23agaacxxueh7aa

Data Type of Application

Public, Open Dataset (Research/Non-Profit)

Custom multisig

Identifier

No response

Share a brief history of your project and organization

I joined the filecoin network in 2021, and made the cc package of fil at the very beginning.  In 2022, we did a part of cc to dc conversion, and now we have a planned continuous development of data storage. In 2023, I established a technical service company  <Laughstorage> to make in-depth investment in the distributed storage track.

Is this project associated with other projects/ecosystem stakeholders?

No

If answered yes, what are the other projects/ecosystem stakeholders

No response

Describe the data being stored onto Filecoin

The Sloan Digital Sky Survey (SDSS) is one of the most ambitious and influential surveys in the history of astronomy. 
The SDSS project has established in 2000, five periods, 18 times data released. The 19th DR will be in 2024. 
As for now, there are about 750TB volume of data can be accessed.  5.5 million directories, 400 million files. 
DR1-DR7: SDSS-I, 2000-2005; SDSS-II, 2005-2008), it obtained deep, multi-color images covering more than a quarter of the sky and created 3-dimensional maps containing more than 930,000 galaxies and more than 120,000 quasars.
DR8: contains all images from the SDSS telescope - the largest color image of the sky ever made. It also includes measurements for nearly 500 million stars and galaxies, and spectra of nearly two million. 
DR9 contains the first release of BOSS spectroscopy to the public as well as several significant updates to the cumulative SDSS archive
DR10 contains the first release of APOGEE infrared Galactic spectroscopy as well as cumulative updates to the BOSS optical extragalactic spectroscopy archive
The SDSS began regular survey operations in 2000, after a decade of design and construction.  It has progressed through several phases, SDSS-I (2000-2005), SDSS-II (2005-2008), SDSS-III (2008-2014), and SDSS-IV (2014-2020).  Each of these phases has involved multiple surveys with interlocking science goals.  The three surveys that comprise SDSS-IV are eBOSS (including SPIDERS and TDSS), APOGEE-2, and MaNGA (including MaStar),
The SDSS-V Pioneering Panoptic Spectroscopy program started observing in October 2020, and consists of three surveys, known in SDSS-V terminology as mapper programs.

Milky Way Mapper is a multi-object spectroscopic survey to obtain near-infrared and/or optical spectra of more than 4 million stars throughout the Milky Way and Local Group
Local Volume Mapper is an optical, integral-field spectroscopic survey that will target the Milky Way, Small and Large Magellanic Clouds, and other Local Volume galaxies
Black Hole Mapper is a multi-object spectroscopic survey that emphasizes optical spectra (often also with multiple epochs of spectroscopy) for more than 300,000 quasars

Where was the data currently stored in this dataset sourced from

My Own Storage Infra

If you answered "Other" in the previous question, enter the details here

No response

If you are a data preparer. What is your location (City and Country)

Mainland CN/HK/NA/Singapore/KR/Vietnam.. I can download data in those area cause i have machine.

If you are a data preparer, how will the data be prepared? Please include tooling used and technical details?

After processing the data I download, I will transfer through online.

If you are not preparing the data, who will prepare the data? (Provide name and business)

No response

Has this dataset been stored on the Filecoin network before? If so, please explain and make the case why you would like to store this dataset again to the network. Provide details on preparation and/or SP distribution.

AFAK NO

Please share a sample of the data

http://classic.sdss.org/dr7/
http://sdss3.org/
https://www.sdss4.org/dr17/data_access/volume/
https://dr17.sdss.org/sas/dr17/apogee/spectro/aspcap/
https://dr17.sdss.org/sas/dr17/apogee/spectro/speclib/

Confirm that this is a public dataset that can be retrieved by anyone on the Network

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Yearly

For how long do you plan to keep this dataset stored on Filecoin

1-3 year

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, North America, South America, Europe

How will you be distributing your data to storage providers

Cloud storage (i.e. S3), HTTP or FTP server, IPFS, Lotus built-in data transfer, Others

How do you plan to choose storage providers

Slack, Partners

If you answered "Others" in the previous question, what is the tool or platform you plan to use

No response

If you already have a list of storage providers to work with, fill out their names and provider IDs below

f0861589 CN,SD C    
f02223876 HK, FTM Ltd.
f02212669 CN,Cryptomage
f02115125 KR, FiveByte
and more are connecting.
Actually is base on their collateral.

How do you plan to make deals to your storage providers

Boost client

If you answered "Others/custom tool" in the previous question, enter the details here

No response

Can you confirm that you will follow the Fil+ guideline

Yes

large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

ghost commented 1 year ago

Hi @26dos

Per the https://github.com/filecoin-project/notary-governance/issues/922 for Open, Public Dataset applicants, please complete the following Fil+ registration form to identify yourself as the applicant and also please add the contact information of the SP entities you are working with to store copies of the data.

This information will be reviewed by Fil+ Governance team to confirm validity and then the application will be triggered for notary review. Let us know if you have any questions.

26dos commented 1 year ago

Hi @Filplus-govteam

The registration form has submitted. Please take a look. thank you.

ghost commented 1 year ago

Hello the following SP entities were submitted: f0861589 SDCLOUD China f02223876 FTM Ltd,. Hong Kong f02115125 FiveByte Korea f02212669 Cryptomage China

no contact information provided to confirm locations.

26dos commented 1 year ago

Hello @Filplus-govteam Sorry, I'm a little confused about how will you confirm the location with SPs by contact information. The result will be shown in the report can not represent where they are?

ghost commented 1 year ago

Yes, this is the problem with the current process @26dos

Good deal making starts with applicants working with trusted SP Entities that can prove who they are and where they are. Anyone can list a set of miner IDs. How do we know who you are, who they are, where they are?

26dos commented 1 year ago

@Filplus-govteam

We have communicated with the SPs, and considering privacy concerns, they are not willing to have their contact information disclosed publicly. However, you can contact SDCLOUD via email at: sdcloud@zcpow.cn

herrehesse commented 1 year ago

@26dos Hello friend, thanks for your LDN application.

How far are you with the download of the SDSS dataset? In previous slingshot competitions we also stored most of it and the download on a 80Gbps bandwidth line to Europe was not enough to download this set in under 4 years. Their source is slow.

Can you proof to me that you have the files on your DP location?

26dos commented 1 year ago

@herrehesse

The data download speed is pretty good on our end, here's the download link, and the file on the disk.

https://data.sdss.org/sas/dr18/data/apogee/spectro/apo/

1ee861f1644f7c445fe1ee6b8ed3f0c 6f6c78a10b082ad309c191b110f9e0e

ghost commented 1 year ago

ok @26dos FYI @Sunnyiscoming

Sunnyiscoming commented 1 year ago

Datacap Request Trigger

Total DataCap requested

12PiB

Expected weekly DataCap usage rate

1PiB

Client address

f1gk53djusmlrwr2extftafi4m23agaacxxueh7aa

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1gk53djusmlrwr2extftafi4m23agaacxxueh7aa

DataCap allocation requested

512TiB

Id

187c1abe-44a6-40a9-9a2b-e90bd2f9fc11

OpenGate01 commented 1 year ago

The data and SPs are pretty well prepared as a first round willing to support.

OpenGate01 commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacecxxhhx6hvqil65x2bxfpgsn473gp7yyo4oeqzwtpg7q5qzi6d352

Address

f1gk53djusmlrwr2extftafi4m23agaacxxueh7aa

Datacap Allocated

512.00TiB

Signer Address

f1im4hmtbfzqnx7ir74kdaiu4ynjhgqh3sdi2snla

Id

187c1abe-44a6-40a9-9a2b-e90bd2f9fc11

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecxxhhx6hvqil65x2bxfpgsn473gp7yyo4oeqzwtpg7q5qzi6d352

AlanGreaterheat commented 1 year ago

First round, willing to support

AlanGreaterheat commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacedvf64yw6colzisobsd3kvom6z66lnu355muvutasirnd4zqludpo

Address

f1gk53djusmlrwr2extftafi4m23agaacxxueh7aa

Datacap Allocated

512.00TiB

Signer Address

f1pnmzlxj7cfeo2v6oj5nco46hkg2l46wj7o4xxui

Id

187c1abe-44a6-40a9-9a2b-e90bd2f9fc11

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedvf64yw6colzisobsd3kvom6z66lnu355muvutasirnd4zqludpo

26dos commented 1 year ago

checker:manualTrigger

cryptowhizzard commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 98.85% of deals are for data replicated across less than 2 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

cryptowhizzard commented 1 year ago

There is only one SP supporting retrieval ( http only ).

When i try to download something it errors. It's retrieval bot gaming.

Please fix ASAP

Scherm­afbeelding 2023-08-23 om 21 54 43
26dos commented 1 year ago

@cryptowhizzard Thank you for the reminder ,the project is currently underway.

cryptowhizzard commented 1 year ago

So , you are 250TB underway out of the 512 TB granted datacap.

Until now everything is sent to 1 miner and retrieval is not working. The only thing we get here is " Underway "

herrehesse commented 1 year ago

@raghavrmadya @simonkim0515 Flagging for abuse.

cryptowhizzard commented 1 year ago
Scherm­afbeelding 2023-08-26 om 21 48 53 Scherm­afbeelding 2023-08-26 om 21 50 40

So, let's make up the math here.

grp & hrp are the graphsync retrieval rate and http retrieval rate according to the Fil+ retrieval bot.

f01834291 has recevied 0% of the total deals , so we don't need to count this one for retrieval.

f02226869 provides retrieval only over http. One small check on the data revealed you store garbage.

Scherm­afbeelding 2023-08-26 om 21 55 16
26dos commented 1 year ago

@cryptowhizzard First of all, please cease your malicious speculations. If you think there is an issue with the Bot's logic, why not address the matter instead of damaging our client's reputation? Furthermore, our project has just begun, and maintaining a high HTTP retrieval rate should not be a problem.

Secondly, the data you have screenshot is not the data we have stored. How did you access this information? The data you've presented appears more like garbled data. Please provide details of your actions.

cryptowhizzard commented 1 year ago

Stop the trash talk will you?

As data preparer you are responsible to pack the right data and do the distribution to your SP's. I advise you to take a look at the whitepaper of Filecoin so you can learn that there is no way the SP can store different data then the data you provided to him.

Secondly , I provided you screenshot's above. As data preparer you should know what is wrong.

Thanks

kevzak commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 94.40% of deals are for data replicated across less than 2 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

kevzak commented 1 year ago

SPs provided f0861589 SDCLOUD China f02223876 FTM Ltd,. Hong Kong f02115125 FiveByte Korea f02212669 Cryptomage China

SPs taking deals: f02239387 | Hong Kong, Central and Western, HK7Road International HK Limited | 79.53 TiB | 29.07% | 79.53 TiB | 0.00% f02226869 | Nanchang, Jiangxi, CNCHINA UNICOM China169 Backbone | 100.00 TiB | 36.55% | 100.00 TiB | 0.00% f02368282new | Chengdu, Sichuan, CNCHINA UNICOM China169 Backbone | 2.84 TiB | 1.04% | 2.84 TiB | 0.00% f02363742new | Shenzhen, Guangdong, CNCHINANET-BACKBONE | 15.25 TiB | 5.57% | 15.25 TiB | 0.00% f02133079 | Hong Kong, Central and Western, HKHK Broadband Network Ltd. | 7.88 TiB | 2.88% | 7.88 TiB | 0.00% f01853077 | Singapore, Singapore, SGZenlayer Inc | 40.00 TiB | 14.62% | 40.00 TiB | 0.00% f01852363 | Singapore, Singapore, SGZenlayer Inc | 27.63 TiB | 10.10% | 27.63 TiB | 0.00% f01834291 | Los Angeles, California, USZenlayer Inc | 512.00 GiB | 0.18% | 512.00 GiB | 0.00%

@26dos not one SP ID you provided matches the SPs taking deals. Per 922 I'm closing this until you confirm your storage plan with SP Entities, locations involved and distribution.

raghavrmadya commented 1 year ago

@kevzak , I see you have closed this application as completed. Confirming if DC must be removed based on dispute here

kevzak commented 1 year ago

@raghavrmadya I closed because it did not meet requirements of 922, SPs did not match. The dispute looks to be a different area of concern, regarding retrievals. You can follow your protocols there