Closed cryptowhizzard closed 9 months ago
We have successfully stored 7 out of 27 NiH LDN applications totalling 5 PiB of data each (with some applications closing below 2 PiB). This final open application will consolidate the remaining 8 to 27 applications into a single large LDN (Large Data Network). By doing so, we aim to enhance transparency, communication, and notary signing processes, offering a perfect solution for community transparency.
We are currently awaiting the completion of our last datacap application and plan to seek manual assistance from Simon or Deep later this week.
Should you have any questions or require further information, please do not hesitate to reach out to us below.
@simonkim0515 @dkkapur
@kevzak We are assigned to FIL-E, can you put us back into a regular LDN?
Hello @cryptowhizzard - as was discussed on previous Notary Governance calls and also announced here, any application (public or private) over 15PiBs will now go through the full upfront check process via E-Fil+. Please complete the steps listed here and I will review and trigger the application for notary review.
I can confirm that @cryptowhizzard has completed the E-Fil+ Registration form and is sharing all information about the dataset and SPs involved. Also, because this is a public, open dataset, there is no KYB required.
120PiB
Expected weekly DataCap usage rate
1PiB
Client address
f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy
Thanks for your request! :exclamation: We have found some problems in the information provided. The request cannot be posted because the identifier in the issue cannot be retrieved
Please, take a look at the request and edit the body of the issue providing all the required information.
120PiB
Expected weekly DataCap usage rate
1PiB
Client address
f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy
f01940930
f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy
512TiB
d6ee8282-9ba0-4cc2-8bdc-1f7dea6db97c
@kevzak With such a large amount, should we be cautious about it? I think the size of the quota is directly proportional to the intensity of information review. For such a large amount of LDN, we should investigate the legality and rationality of data use, evidence of data download, transmission, storage, and processing capabilities, and background investigation of SPs. In addition, I personally think that for similar data sets, we implement them with phased goals.
What I understand is that the governance proposal is to increase the upper limit of a single LDN from 5P to 15P. Is it my misunderstanding? Or will the rules be updated again? There is currently no upper limit on the application quota for a single LDN.
@sxxfuture-official @Joss-Hua there seems to be some confusion here. Let me attempt to clarify for you because this is the first example of an application in this new process:
Regarding the DataCap request limits:
Regarding this specific application:
Regarding SXX comment: We should investigate the legality and rationality of data use, evidence of data download, transmission, storage, and processing capabilities, and background investigation of SPs.
This is exactly the correct thinking and expectation the community should have regarding notary due diligence.
I agree: For notaries, please check the applicant, check the data, check the SPs involved. With E-Fil+, a lot of this information is already collected in the application to help notaries complete due diligence.
For example: The applicant has shared a sample and they have shared the SPs involved. They have also completed 7 previous applications that can be reviewed on the CID Checker Report LINK1 LINK2 for this dataset.
In conclusion: The applicant is following the Fil+ program rules by using one application for a complete dataset. Notaries, feel free to review their application information above and also the previous storage history with this dataset and post any questions in the comments as needed.
I don't think Speedium is capable of utilizing such a large amount of DataCap.
First, most of the NIH NCBI Sequence Read Archive applications are still in progress and have not been further encapsulated in the last few weeks.
Second, there are numerous problems with the DC usage reports for all of these applications. For example, data duplication and CID sharing between applications with non-identical data sets. https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1553#issuecomment-1461442216 https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1873#issuecomment-1558822890
Third, a large number of their previous applications included data set mixups, numerous notary approvals without providing data sets or data samples, and the actual usage of DataCap did not match the allocation plan provided. https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/348 https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/339 https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/540
Last, I would like to remind everyone that the proposal to remove DC request limit is still in the pilot phase. We can't predict the impact of the modification yet. It is illogical to approve a DataCap of 120PiB during the 6-week trial period. https://github.com/filecoin-project/notary-governance/issues/851#issuecomment-1536475454
Additionally, so many queries about Speedium are glossed over with reasons such as it happened a long time ago and why we were questioned so many times, instead of providing direct explanations.
They question the miners from Asia as judges, criticizing their explanations, but not acknowledging the violations they are involved in. As a global project, it would be damaging the harmony of the community if officials did not intervene and investigate in a timely manner. cc @dkkapur @dannyob @jbenet
Dear @Yvette516 ,
Thank you for raising this matter once again. We share your commitment to maintaining the integrity of the platform and actively working to expose any fraudulent activities.
Most of the NIH NCBI Sequence Read Archive applications are closed now. Thank you for reminding us.
In regards to the application from Speedium, there have been multiple explanations provided that address the concerns you wish to discuss. These explanations clearly outline the reasons for the issue, the implemented solution, and confirm that the problem has not recurred. You can find additional details in the following links:
While we understand that you may have differing opinions on the matter, it is important to acknowledge that the responsible party, Cryptowhizzard, has resolved the issue and addressed the question you raised, months before. We kindly request that you consider the information provided and refrain from engaging in further attempts of gaslighting. It has become quite evident that attempts at gaslighting are no longer effective.
However, we want to emphasise that open and respectful debates are welcome within our community. We value diverse perspectives and encourage constructive discussions. In this regard, we invite you to participate in the public Trust and Transparency call scheduled for tomorrow. It will provide an opportunity to engage in a comprehensive debate and address any remaining concerns you may have regarding fraudulent activities.
If you are unwilling to listen to the answers provided for each of your questions, there is no reason for me to keep responding. I have engaged in discussions with multiple FIL+ admins, and we have collectively agreed to address all the points raised during the T&T call. Your absence from such discussions reflects the level of seriousness you attribute to the matter.
It is illogical to approve a DataCap of 120PiB during the 6-week trial period.
Everyone is fully aware of our business operations, including: @dkkapur @kevzak @Kevin-FF-USA @raghavrmadya and @galen-mcandrew
Your Datacap Allocation Request has been proposed by the Notary
bafy2bzacecblyxdz7acpmss2fbdbx5kgbblvigxwzkgwkaiovjf3o3k4b2mls
Address
f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy
Datacap Allocated
512.00TiB
Signer Address
f1j3u7crhjzwb2cj5mq7vodlt4o66yoyci7lhcauy
Id
d6ee8282-9ba0-4cc2-8bdc-1f7dea6db97c
You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecblyxdz7acpmss2fbdbx5kgbblvigxwzkgwkaiovjf3o3k4b2mls
Reached out by the client on Slack. Everything looks all right on the application.
I communicated with @herrehesse on slcak and he showed me the size of the dataset they have downloaded now. I can support the first round. Given that this is the first large LDN, I will be monitoring the client's next allocation.
Your Datacap Allocation Request has been approved by the Notary
bafy2bzacebtglrir7iri7cvre6eowmqyti27z6gjnfzkwtuoybhd5hmewfewm
Address
f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy
Datacap Allocated
512.00TiB
Signer Address
f1pszcrsciyixyuxxukkvtazcokexbn54amf7gvoq
Id
d6ee8282-9ba0-4cc2-8bdc-1f7dea6db97c
You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebtglrir7iri7cvre6eowmqyti27z6gjnfzkwtuoybhd5hmewfewm
checker:manualTrigger
⚠️ 1 storage providers sealed too much duplicate data - f01208803: 20.79%
✔️ Data replication looks healthy.
⚠️ CID sharing has been observed. (Top 3)
[^1]: To manually trigger this report, add a comment with text checker:manualTrigger
[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger
[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...
Click here to view the full report.
talked to @herrehesse , everything looks OK on our side
f01940930
f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy
1PiB
d6ee8282-9ba0-4cc2-8bdc-1f7dea6db97s
checker:manualTrigger
⚠️ 1 storage providers sealed too much duplicate data - f01208803: 20.79%
✔️ Data replication looks healthy.
⚠️ CID sharing has been observed. (Top 3)
[^1]: To manually trigger this report, add a comment with text checker:manualTrigger
[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger
[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...
Click here to view the full report.
⚠️ 1 storage providers sealed too much duplicate data - f01208803: 20.79%
✔️ Data replication looks healthy.
⚠️ CID sharing has been observed. (Top 3)
[^1]: To manually trigger this report, add a comment with text checker:manualTrigger
[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger
[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...
Click here to view the full report.
@cryptowhizzard you should reduce the duplicate data in f01208803, in this round willing to support
Your Datacap Allocation Request has been proposed by the Notary
bafy2bzacebtjmgtb3f32wd6vlzw4xkztqpih2orasuvial3fcpfb57oufakmm
Address
f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy
Datacap Allocated
1.00PiB
Signer Address
f1bp3tzp536edm7dodldceekzbsx7zcy7hdfg6uzq
Id
d6ee8282-9ba0-4cc2-8bdc-1f7dea6db97s
You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebtjmgtb3f32wd6vlzw4xkztqpih2orasuvial3fcpfb57oufakmm
Your Datacap Allocation Request has been approved by the Notary
bafy2bzacedzhs6ptum4bjau5jovfg3ltm3xmubx7d7bs6xzhjwthwntnde4zi
Address
f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy
Datacap Allocated
1.00PiB
Signer Address
f1e77zuityhvvw6u2t6tb5qlnsegy2s67qs4lbbbq
Id
d6ee8282-9ba0-4cc2-8bdc-1f7dea6db97s
You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedzhs6ptum4bjau5jovfg3ltm3xmubx7d7bs6xzhjwthwntnde4zi
checker:manualTrigger
⚠️ 1 storage providers sealed too much duplicate data - f01208803: 20.79%
✔️ Data replication looks healthy.
⚠️ CID sharing has been observed. (Top 3)
[^1]: To manually trigger this report, add a comment with text checker:manualTrigger
[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger
[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...
Click here to view the full report.
@Normalnoise
@cryptowhizzard you should reduce the duplicate data in f01208803, in this round willing to support
I will immediately reach out to Holon @jhookersyd. To the best of my knowledge, any duplicates that may exist were a result of a bug that was resolved in January and have not reoccurred since then. However, I can certainly ask them to remove the duplicates. It's worth noting that these duplicates account for approximately 0.1% of the entire data, so their removal may not have a significant impact. Please inform me if you have a different perspective or believe otherwise.
Hi All, @herrehesse, @Normalnoise,
This duplicate file problem happened back in January. It was before Holon had a DB checking every file coming in. As with everything in the Filecoin ecosystem you have to build tools on the fly to fix problems as they arise. Anthony Smith our head of engineering has basically worked every day for one year to make stuff happen.
FYI this miner has been retired since the 7th of March. The sectors on this miner are just screwed. We gave up trying to fix it. 20% is 100TiB of data or 0.1% of the project. To fix this issue with FIL at 4 bucks and collateral rewards going down feels a little over zealous considering it happened 5 months ago and nobody has complained about it since then. The Holon team has definitely learnt a lesson here. Double check everything! Thanks, Jonathan
f01940930
f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy
1PiB
9100057d-469b-4916-8624-e0ef4229ab20
Looks like the bot is having issues because of multiple applications using the same client address at the same time.
Notaries - Do not sign this until we can reset the allocation request. @simonkim0515 @fabriziogianni7
@kevzak @simonkim0515 @fabriziogianni7 Working on updating our system so we can use a new client address soon. Until then please assist with a manual reset & trigger so we can continue with our distribution.
@herrehesse - I've edited this allocation amount to 1PiB for 3rd allocation request. This matches the updated guidelines. Notaries are clear to sign this now. Thank you.
Your Datacap Allocation Request has been proposed by the Notary
bafy2bzaceajkgap6m6itreenpptvn2k7bkhebgvzzzbajigpmupvlax7j6xn4
Address
f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy
Datacap Allocated
1.00PiB
Signer Address
f1hlubjsdkv4wmsdadihloxgwrz3j3ernf6i3cbpy
Id
9100057d-469b-4916-8624-e0ef4229ab20
You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceajkgap6m6itreenpptvn2k7bkhebgvzzzbajigpmupvlax7j6xn4
Your Datacap Allocation Request has been approved by the Notary
bafy2bzacea2pweedmc2mzzjxaildfbqib32mwhcugjopz2w6tj3q722eomzkq
Address
f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy
Datacap Allocated
1.00PiB
Signer Address
f1bp3tzp536edm7dodldceekzbsx7zcy7hdfg6uzq
Id
9100057d-469b-4916-8624-e0ef4229ab20
You can check the status of the message here: https://filfox.info/en/message/bafy2bzacea2pweedmc2mzzjxaildfbqib32mwhcugjopz2w6tj3q722eomzkq
Hello @cryptowhizzard Can you help me about why I can not read the data/document you have stored? I can't recognize what data I've downloaded.
checker:manualTrigger
⚠️ 1 storage providers sealed too much duplicate data - f01208803: 20.79%
✔️ Data replication looks healthy.
⚠️ CID sharing has been observed. (Top 3)
[^1]: To manually trigger this report, add a comment with text checker:manualTrigger
[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger
[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...
Click here to view the CID Checker report. Click here to view the Retrieval report.
retrieval success rate seems to low
Can you explain about cid sharing?
@Sunnyiscoming = https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/2008#issuecomment-1567958183
Thanks, J
checker:manualTrigger
⚠️ 1 storage providers sealed too much duplicate data - f01208803: 20.81%
✔️ Data replication looks healthy.
⚠️ CID sharing has been observed. (Top 3)
[^1]: To manually trigger this report, add a comment with text checker:manualTrigger
[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger
[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...
Click here to view the CID Checker report. Click here to view the Retrieval report.
f01940930
f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy
2000TiB
dac4fe66-d84e-454c-9d45-232ae75730fp
checker:manualTrigger
Your Datacap Allocation Request has been proposed by the Notary
bafy2bzacedr2a3pchmxxkzaq2q7gryaqnfikdfxmgfymoatnalptfeadra4ik
Address
f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy
Datacap Allocated
1.95PiB
Signer Address
f1j3u7crhjzwb2cj5mq7vodlt4o66yoyci7lhcauy
Id
dac4fe66-d84e-454c-9d45-232ae75730fp
You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedr2a3pchmxxkzaq2q7gryaqnfikdfxmgfymoatnalptfeadra4ik
Reached out by the client on Slack. CIDChecker looks within acceptable range. SP allocation plan looks good.
Data Owner Name
NIH - National Institute of Health
What is your role related to the dataset
Data Preparer
Data Owner Country/Region
United States
Data Owner Industry
Life Science / Healthcare
Website
https://www.nih.gov/
Social Media
Total amount of DataCap being requested
120 PiB
Expected size of single dataset (one copy)
15 PiB
Number of replicas to store
10
Weekly allocation of DataCap requested
1PiB
On-chain address for first allocation
f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy
Data Type of Application
Public, Open Dataset (Research/Non-Profit)
Custom multisig
Identifier
efil
Share a brief history of your project and organization
Is this project associated with other projects/ecosystem stakeholders?
No
If answered yes, what are the other projects/ecosystem stakeholders
No response
Describe the data being stored onto Filecoin
Where was the data currently stored in this dataset sourced from
AWS Cloud
If you answered "Other" in the previous question, enter the details here
No response
How do you plan to prepare the dataset
IPFS, lotus, singularity
If you answered "other/custom tool" in the previous question, enter the details here
No response
Please share a sample of the data
Confirm that this is a public dataset that can be retrieved by anyone on the Network
If you chose not to confirm, what was the reason
No response
What is the expected retrieval frequency for this data
Monthly
For how long do you plan to keep this dataset stored on Filecoin
1 to 1.5 years
In which geographies do you plan on making storage deals
Greater China, Asia other than Greater China, North America, Europe, Australia (continent)
How will you be distributing your data to storage providers
HTTP or FTP server, IPFS, Lotus built-in data transfer
How do you plan to choose storage providers
Slack, Big Data Exchange, Partners
If you answered "Others" in the previous question, what is the tool or platform you plan to use
No response
If you already have a list of storage providers to work with, fill out their names and provider IDs below
How do you plan to make deals to your storage providers
No response
If you answered "Others/custom tool" in the previous question, enter the details here
No response
Can you confirm that you will follow the Fil+ guideline
Yes