FILCAT / dotStorage-deal-renewal

A solidity contract that makes renewal deal for nft.storage data with storage providers on Filecoin.
Other
7 stars 0 forks source link

Adding preliminary sample list #13

Closed ribasushi closed 1 month ago

ribasushi commented 1 year ago

This list is accurate / describes actual available data. The prefix url/location needs to be determined by @dchoi27 and @vasco-santos from the daghaus team.

jennijuju commented 1 year ago

@ribasushi where is the data location?

ribasushi commented 1 year ago

@dchoi27 I updated the PR, now has 448 entries. All of them are in R2, currently visible to the worker. You need to decide whether to go through with this or do something else...

snissn commented 1 year ago

hi @ribasushi i think what is missing here is the data download location for these files or location_ref

Reiers commented 1 year ago

https://github.com/lotus-web3/dotStorage-deal-renewal/issues/16

snissn commented 1 year ago

we need the car file size as well. we can drop the car filename.

elizabeth-griffiths commented 1 year ago

Hey @dchoi27 - @Shrenuj Bansal and team identified that we need to include the car size as part of the payload. Currently car size is not included in the csv. Can you please resend the csv with car size, in addition to the details in the csv we already have?

dchoi27 commented 1 year ago

i think @ribasushi got the original data from a database that has the DAG size, so probably most straightforward if he does that quickly tomorrow.

the filename was required to generate the download URLs in the version of this with the links

elizabeth-griffiths commented 1 year ago

Thanks @dchoi27 ! I thought I saw you share a csv, must have been for something else.

@ribasushi - @shrenuj Bansal and team identified that we need to include the car size as part of the payload. Currently car size is not included in the csv. Can you please resend the csv with car size, in addition to the details in the csv we already have?

dchoi27 commented 1 year ago

i did share a CSV. it was the same CSV as the one in this PR that riba made, but with the download links (what i referred to here as the version of this with the links). it was shared over slack to not make the download links public. i did not contribute to this PR, so not sure why you asked me here.

all i'm saying is that it's probably fastest for him to get the DAG sizes, because i think the data source he queried to get the CSV in this PR has them.

(you don't have to tag him again in a copy-pasta, he'll see this thread when he wakes up tomorrow)

dchoi27 commented 1 year ago

actually i might be able to fish out the DAG sizes using R2's CLI and scripting grabbing them. @ribasushi's underwater so i'll try and save him from worrying about this. looking into it now

elizabeth-griffiths commented 1 year ago

@dchoi27 Separate but related, I was able to connect with the team today regarding URL expiry time period (from 7 days to longer). Since you're working on this now, I wanted to flag as I believe it may cause rework if we decide to change the expiry time later.

Net is, we would like to change from 7 days to 30 days. I was in the process of double checking rationale with team. Just got confirmation (what good timing!). Here's why:

Also, importantly, and to answer your other question from yesterday, SPs download the urls from the contract, no the github repo.

dchoi27 commented 1 year ago

^ i think this is the wrong place to talk about this

dchoi27 commented 1 year ago

OK - i think this should be right (since i didn't have the query @ribasushi ran i just downloaded the entire aggregates table out of dagcargo and used Google Sheets to join the sizes. I spot checked a number of them and they look right.

Didn't have permission to commit to this PR so here's a link to the spreadsheet (it's in the first tab) https://docs.google.com/spreadsheets/d/1Kw0zZh6xSGLvU0TK05SCUMEuBi8OtP3p81UdGGneHHg/edit?usp=sharing

jennijuju commented 1 year ago

@dchoi27 I have sent you an invite with write perm

ribasushi commented 1 year ago

Folks NO. At no point during the dealmaking process do you need the actual size of the car. This is precisely why I didn't send it. Please adjust the contract and remove the superfluous info, things are hard enough as it is.

jennijuju commented 1 year ago

Folks NO. At no point during the dealmaking process do you need the actual size of the car. This is precisely why I didn't send it. Please adjust the contract and remove the superfluous info, things are hard enough as it is.

unfortunately, boost is asking for it https://github.com/lotus-web3/dotStorage-deal-renewal/pull/13#issuecomment-1459631110

(also mentioned here

jennijuju commented 1 year ago

@dchoi27 just to confirm, ideally the final file has the following columns pieceCID, pieceSize, carSize, locationURL

dchoi27 commented 1 year ago

what's the difference between pieceSize and carSize? only car file size was asked for above. i think they might be the same in this case since the CAR is already aggregated (assume it has padding already, etc.)?

you all have access to this spreadsheet https://github.com/lotus-web3/dotStorage-deal-renewal/pull/13#issuecomment-1459156742 please be prescriptive if anything is missing, i don't know what ya'll need so i'm just following what you're asking for in the thread

jennijuju commented 1 year ago

what's the difference between pieceSize and carSize? only car file size was asked for above. i think they might be the same in this case since the CAR is already aggregated (assume it has padding already, etc.)?

you all have access to this spreadsheet #13 (comment) please be prescriptive if anything is missing, i don't know what ya'll need so i'm just following what you're asking for in the thread

The spread has piece CID, piece size, and car sizes and we are good there, we just need to make sure the final csv that you will be creating next Tuesday, also has the location, in the same file.

dchoi27 commented 1 year ago

oh LOL sorry the piece size with padding is in there already. my b, i missed it.

let's leave this PR alone for what the final deliverable is. i didn't put the download URLs here because they shouldn't be public yet - i'll send the final file over slack (like i did the last one)

dchoi27 commented 1 year ago

anyway, worst case scenario, as long as you have the updated download links with some unique identifier by record, you can always join it with the file in the spreadsheet

jennijuju commented 1 year ago

@dchoi27 FYI - This is the format of the csv for data the eng team prefers on Tuesday https://github.com/lotus-web3/dotStorage-deal-renewal/blob/main/scripts/2mbsample.csv

please provide the data in this schema to help with a smooth operation.

dchoi27 commented 1 year ago

sure, my script just adds the download links to the CSV that riba provided, but i can put it in google sheets and get it into that format if it's helpful

jennijuju commented 1 year ago

sure, my script just adds the download links to the CSV that riba provided, but i can put it in google sheets and get it into that format if it's helpful

thank you! It will save us some time to joint the csv ourselves and prevent we make mistakes - so would be super helpful.💙

ribasushi commented 1 month ago

This is mega-outdated