Closed ryancwalsh closed 2 years ago
I will work on this in this branch / draft PR here: https://github.com/NEAR-Edu/near-certification-tools/pull/19
This successfully returns a result in pgAdmin (although I still need to think through the logic and whether the sorting of each part is correct):
SELECT moment, diff_to_previous FROM
(
SELECT *,
((EXTRACT(epoch FROM moment) - EXTRACT(epoch FROM lag(moment) over (ORDER BY moment))) / (60*60*24))::int
AS diff_to_previous
FROM (SELECT TO_TIMESTAMP(r."included_in_block_timestamp"/1000000000) as moment
FROM PUBLIC.RECEIPTS R
LEFT OUTER JOIN PUBLIC.ACTION_RECEIPTS AR ON R.RECEIPT_ID = AR.RECEIPT_ID
WHERE SIGNER_ACCOUNT_ID = 'hatchet.testnet'
AND r."included_in_block_timestamp" > 0
ORDER BY moment DESC
) as relevantActivity
) as activityWithDiffs
WHERE diff_to_previous > 10
ORDER BY moment ASC
LIMIT 1
But the query at https://github.com/NEAR-Edu/near-certification-tools/pull/19/commits/07e50ce23b856e4080ef96e52d65b59a3420b79e#diff-820adf0456b1bc82dc216822458c23884ee9abcf73fe2bd4241ddd62fd69b571R87 still does not run without error.
Ok, after https://github.com/NEAR-Edu/near-certification-tools/commit/735a88e7cef8efe54aece4374f435ebc069d84cb now at least getStartOfFirstLongPeriodOfInactivity successfully runs when I visit http://localhost:3000/api/cert/103216412112497cb6c193152a27c49a.png (if I temporarily set const expirationDays = 4;
since in reality hatchet.testnet
does not have 180 days of inactivity).
Next, we will want to scrutinize everything to make sure it all makes sense, improve comments, maybe add some tests.
Ok, after 735a88e now at least getStartOfFirstLongPeriodOfInactivity successfully runs when I visit http://localhost:3000/api/cert/103216412112497cb6c193152a27c49a.png (if I temporarily set
const expirationDays = 4;
since in realityhatchet.testnet
does not have 180 days of inactivity).Next, we will want to scrutinize everything to make sure it all makes sense, improve comments, maybe add some tests.
I worked on the logic of calculating the expiration date yesterday using the raw query we had with Prisma. I see you already handled that.
I still wanted to push my WIP code now as a draft pr to be deleted immediately after, but looks like I don't have permission to push to this repo. It still had refactoring to be done on it, because I had redundant lines, not descriptive variable naming, and a logical mistake at a point.
This is the snippet of the code I had: https://codeshare.io/nzwylY
It would be unnecessary to have a pr that'll be deleted anyways, but I'd appreciate if you could grant me permission for future prs. I will check how to refactor the query if I find the time today and see what I can do.
Continuing this from here : https://discord.com/channels/828768337978195969/906115083250307103/944123557963788338
It made sense at first but now I'm questioning it again.
You approved this logic to be correct:
1- If diff_to_previous > 180 is before issue date ==> Expiration date = issue date + 6 months;
2- If diff_to_previous > 180 is after issue date ==> Expired;
In this case the expiration date is exactly the moment in the past, where diff_to_previous > 180 happened.
3- If no query result where diff_to_previous > 180, which means the account doesn't have a period where it hasn't been active for 180 days straight after the issue date ==> Expiration date = last activity + 6 months
But if we only query results after the issue date, don't we lose the necessary data to distinguish between 1 and 3 above?
I tried to add this logic into fetchCertificateDetails()
where the issue date is in easy reach (Please see codeshare snippet in previous comment). There's a chance I'm missing something here but I just want to make sure.
So, updating the conditions:
1- If diff_to_previous > 180 returns result ==> Expired;
In this case the expiration date is exactly the moment in the past, where diff_to_previous > 180 happened.
2- If no query result where diff_to_previous > 180 ==> Expiration date = last activity + 6 months
We don't care about the issue date (except in the query obviously) to calculate the expiration date, is this correct?
I spent some time going through Prisma.io docs and first tried to figure out how to translate the lag() and compute the date difference with Prisma syntax.
I tried to create a function that would calculate diff_to_previous
between dates and started off from the query we have here and listed the ones after the issue date. I referenced this: https://www.prisma.io/docs/concepts/components/prisma-client/computed-fields but the below code has not much in common. It just helped me understand that I should do calculations outside of the query I guess.
async function getDateDiff(accountName: string, issuedAt: string) {
const issuedAtUnixNano = dayjs(issuedAt).unix() * 1_000_000_000;
const rows = await prisma.receipts.findMany({
where: {
action_receipts: {
// https://www.prisma.io/docs/concepts/components/prisma-client/filtering-and-sorting#filter-on-relations
signer_account_id: {
equals: accountName,
},
},
included_in_block_timestamp: {
lte: issuedAtUnixNano,
},
},
orderBy: {
included_in_block_timestamp: 'desc',
},
});
const activityDates = rows.map((res) => {
return res.included_in_block_timestamp;
});
console.log('################ activityDates ################', activityDates);
}
This approach already proved to be a bad idea from the very start even before trying to get the time differences:
I'd like to explore further but considering the time constraint, this will definitely take me time (there's always hope right). This looks like something I'd learn from your solution in the end. I'm pausing here for now.
@eadsoy At https://github.com/NEAR-Edu/near-certification-tools/commit/07e50ce23b856e4080ef96e52d65b59a3420b79e#diff-820adf0456b1bc82dc216822458c23884ee9abcf73fe2bd4241ddd62fd69b571R109-R110 I wrote:
* If the account doesn't have a period where it hasn't been active for 180 days straight after the issue date: Expiration date = last activity + 180 days
* Otherwise, if 1 or more >180-day periods of inactivity exist after issueDate, expiration = the beginning of the *first* such period + 180 days.
I think that's what we want, right? And I thought that's what you were saying in your 3 points (once you incorporated what I said about ignoring any long periods of inactivity that happened before issueDate since they would be irrelevant).
I haven't been able to think of a way to handle it in just one SQL query, which is why getExpiration
is a wrapper around 2 functions, getStartOfFirstLongPeriodOfInactivity
and getMostRecentActivityDateTime
. I don't know whether a single SQL query would be possible.
I also don't know whether Prisma even supports abstracting away the complicated logic we're trying to do here. We might need to use the raw feature (that we're starting with). Prisma is a relatively new tool and can't handle everything people want to do.
I looked at your codeshare and it looked like it was the right direction, and I think you're thinking about this well and asking good questions. I gave you access to the repo; sorry you did't have it til now.
I think that's what we want, right? And I thought that's what you were saying in your 3 points (once you incorporated what I said about ignoring any long periods of inactivity that happened before issueDate since they would be irrelevant).
I understand considering any long inactivity(180 days) period before the issue date being irrelevant. I tried to draw two scenarios (the second I thought of later). These are not scaled but I hope it's still clear enough:
Scenario 1 was actually what was concerning me all along but I wasn't able to explain it properly, it wasn't clear enough for me as well. Hence the reason why I thought we need some information of activity before the issue date.
* If the account doesn't have a period where it hasn't been active for 180 days straight after the issue date: Expiration date = last activity + 180 days
So in this case I was thinking we still need to have a closer look at the pre-issue date period.
Scenario 2 contains another situation I wanted to ask you about (please see text under Scenario2.)
I guess both of these are unimportant concerns right? I'm sorry if I'm creating unnecessary confusion.
I haven't been able to think of a way to handle it in just one SQL query, which is why getExpiration is a wrapper around 2 functions, getStartOfFirstLongPeriodOfInactivity and getMostRecentActivityDateTime. I don't know whether a single SQL query would be possible.
I tried to merge both queries we have in getStartOfFirstLongPeriodOfInactivity and getMostRecentActivityDateTime this morning -->Run first query and check 180-day-inactivity, if no result run second query and check most recent activity date; but I haven't been successful yet. I will be working on it today.
I also don't know whether Prisma even supports abstracting away the complicated logic we're trying to do here. We might need to use the raw feature (that we're starting with). Prisma is a relatively new tool and can't handle everything people want to do.
It definitely can be that I'm lacking knowledge to decide correctly or maybe I gave up too early but what I understood is, in our case, it doesn't seem too promising.
I looked at your codeshare and it looked like it was the right direction, and I think you're thinking about this well and asking good questions. I gave you access to the repo; sorry you did't have it til now.
Thank you!
Using this approach with CTEs: https://stackoverflow.com/a/68684814/10684149, I named the first CTE query that looks for long period of inactivity long_period_of_inactivity
, and the second one that returns most recent activity date most_recent_activity
. If the first query doesn't return a result, the second query is executed.
In the second query I selected moment as we have in the first one and included_in_block_timestamp
in receipts just to match the integer data type in diff_to_previous
and see if we get results. This is very long and the execution time was around 459 ms, with my testnet account. I am just adding it here as a working and one whole query.
WITH long_period_of_inactivity AS (
SELECT
moment,
diff_to_previous
FROM (
SELECT ,
/ 1 day = 60sec 60min 24h = 86400 sec/
((EXTRACT(epoch FROM moment) - EXTRACT(epoch FROM lag(moment) over (ORDER BY moment))) / 86400)::int
AS diff_to_previous
FROM (
SELECT TO_TIMESTAMP(R."included_in_block_timestamp"/1000000000) as moment
FROM PUBLIC.RECEIPTS R
LEFT OUTER JOIN PUBLIC.ACTION_RECEIPTS AR ON R.RECEIPT_ID = AR.RECEIPT_ID
WHERE 'esinadsoy.testnet' IN (SIGNER_ACCOUNT_ID)
ORDER BY moment DESC
) as t1
) as t2
/ Change to diff_to_previous > 180
/
WHERE (diff_to_previous > 180)
ORDER BY moment DESC
LIMIT 1
), most_recent_activity AS (
SELECT TO_TIMESTAMP(receipt."included_in_block_timestamp"/1000000000) as moment, included_in_block_timestamp
FROM (
SELECT
FROM PUBLIC.receipts R
LEFT OUTER JOIN PUBLIC.ACTION_RECEIPTS AR ON R.RECEIPT_ID = AR.RECEIPT_ID
WHERE 'esinadsoy.testnet' IN (SIGNER_ACCOUNT_ID)
) as receipt
WHERE NOT EXISTS (TABLE long_period_of_inactivity)
ORDER BY moment DESC
LIMIT 1
)
TABLE long_period_of_inactivity
UNION ALL
TABLE most_recent_activity
If I look for over 180 days of inactivity period, since it doesn't exist, it runs the second query and moment is the most recent activity date:
If I look for over 5 days of inactivity period, since it exists, it runs the first query we already had:
@eadsoy Thanks for the diagrams and explanations at https://github.com/NEAR-Edu/near-certification-tools/issues/17#issuecomment-1046625720
Scenario 1 probably would never happen since I think the certificate being issued (on the Issue Date) would cause activity on mainnet for that account. Even if it didn't, though, we definitely feel like someone's competencies are "fresh" as of Issue Date.
Scenario 2 is a great catch. You're right. Our approach of using the lag
function to calculate the diff between each row is not a comprehensive enough check. I think for the most recent row we also need to calculate the diff between that moment and "now".
I had never heard of CTEs and am glad you found them.
I'm really appreciating your thought process, questions, and your pursuit of these new approaches (like CTEs).
If you're able to continue solving these problems on this branch https://github.com/NEAR-Edu/near-certification-tools/tree/feature/getExpiration that would be wonderful.
#18 (comment) Thank you very much @ryancwalsh, for being patient and answering all my questions without skipping a point.
Scenario 1 probably would never happen since I think the certificate being issued (on the Issue Date) would cause activity on mainnet for that account. Even if it didn't, though, we definitely feel like someone's competencies are "fresh" as of Issue Date.
Ahh I see, yes both cases make perfect sense to me. Got it.
Scenario 2 is a great catch. You're right. Our approach of using the lag function to calculate the diff between each row is not a comprehensive enough check. I think for the most recent row we also need to calculate the diff between that moment and "now".
I tried to include the render date(or now) in both queries respectively and added an additional column called diff_from_last_activity_to_render_date
which shows the diff between most recent activity date and now.
Having it in either of the queries worked fine but I put it in the second query where we look for the most recent activity date. This is how I approached it:
When using CTE, both query results must have the same number of columns and same data types, I tried to use this to our advantage:
1- First query runs (long_period_of_inactivity):
If it returns a result it means the account had a 180-day-period of inactivity after the issue date. I changed the order to ASC
to get the first incident of such period if more than one exists. I thought this will make sure to lock the expiration date and we still can limit the result to one. If I were to keep it as DESC
, it would show the last period of inactivity if more than one exists and the expiration date would be calculated from that date (hope that makes sense).
moment
shows the "end date" of that period. diff_to_previous
shows the days of inactivitydiff_from_last_activity_to_render_date
is set to a default value which is 1234567890. 2- If the first query doesn't return any result the second query runs ( most_recent_activity):
This query returns the most recent activity date ordered by DESC
moment
shows the last activity date. diff_to_previous
is set to a default value which is 1234567890. diff_from_last_activity_to_render_date
shows the days between render date and most recent activity date. The number 1234567890 shows days in both query results. This equals to ~3382378 years. Keeping this in mind, I thought I can use this as some kind of 'password' to differentiate between the query results. To decide whether we're getting the most recent activity date or an inactivity period.
/**
* If the account doesn't have a period where it hasn't been active for 180 days straight after the issue date:
* Days between last activity and render date is checked:
* If this value is >180; Certificate is expired. Expiration date = last activity + 180 days
* If this value is <180; Certificate hasn't expired yet. Expiration date = last activity + 180 days
* Otherwise, if >180-day period of inactivity exist after issueDate, expiration = the beginning of the *first* such period + 180 days.
*/
const moment = dayjs(result[0].moment);
let expirationDate;
if (result[0].diff_to_previous === QUERY_DEFAULT_CELL_VALUE) {
expirationDate =
result[0].diff_from_last_activity_to_render_date > expirationDays
? `Expired on ${formatDate(moment.add(expirationDays, 'days'))}`
: formatDate(moment.add(expirationDays, 'days'));
} else {
/**
* >180-day period of inactivity exists.
* moment is the end date of such period.
* Extract 180 from moment to get the exact days between inactivity period in days and 180
*/
const daysToMomentOfExpiration = result[0].diff_to_previous - 180;
/**
* Subtract daysToMomentOfExpiration from moment to get the specific date of expiration.
* This subtraction equals to (start of inactivity period + 180 days)
*/
expirationDate = `Expired on ${formatDate(moment.subtract(daysToMomentOfExpiration, 'days'))}`;
}
return expirationDate;
After the query is executed :
1- I check for the diff_to_previous
value.
If it's equal to QUERY_DEFAULT_CELL_VALUE
, which is our default value of 1234567890,
it means that the query returned the most recent activity date and 180-inactivity doesn't exists.
diff_from_last_activity_to_render_date
is over 180 days, this means the certificate has expired.
Expiration date = last activity + 180 days
Certificate will show "Expired on last activity + 180 days
"
This solves the issue at Scenario 2. diff_from_last_activity_to_render_date
is below 180 days, this means the account had activity in the last 180 days.
Certificate has not expired yet.
Expiration date = last activity + 180 daysIf diff_to_previous
is not equal to QUERY_DEFAULT_CELL_VALUE
, this means the query returned the first date where 180-days-of-inactivity occurred.
diff_to_previous
and get the difference between this period and 180 days. Then, I subtract this difference from moment
. This gives us the start date of inactivity period + 180 days
which is the expiration date. I'm not sure if this approach with a fixed number (1234567890) to separate between query results is a good way (although it's good to know we will be fine for 3382378 years).
I will both look for an alternative and also for ways to refactor the code.
I have pushed it, since it's working.
I had never heard of CTEs and am glad you found them.
Thank you, I've read that it is used it to make complex queries more readable and I'm happy you found it useful too.
Also, I'm enjoying this very much!
Screenshots from PgAdmin:
While working on feature/getExpiration, I switched to develop to get recent changes, switched back to feature/getExpiration and rebased my branch.
After my work was done I staged the changes, committed and pushed it but I got this message Your branch and 'origin/feature/getExpiration' have diverged...
which is expected (right?) because I rebased feature/getExpiration
to develop
which made it ahead of the remote origin/feature/getExpiration
.
So I ran git pull -rebase
but I had to go through ~50 commits manually to catch up with develop.
Instead, since I had develop on my feature/getExpiration branch, I thought it's ok to force push so that origin/feature/getExpiration
gets updated too.
Now when you view the files changed in feature/getExpiration
, it compares it to what develop has in [imageFileName].ts
I think, instead of your last commit 4d2f58d
But the commit history of feature/getExpiration looks fine to me? https://github.com/NEAR-Edu/near-certification-tools/commit/ef239675c814d68e3df277d1f50c64a7bea2007e
I'd love to learn if this was an incorrect approach and what I should have done instead.
Hey @eadsoy !
Sorry for not telling you to use merge
instead of rebase
. I made a video for you here: https://www.loom.com/share/2ec42b3d8f94486eb533b5ebd5930e53
That's my fault. I hope I didn't cause you too much of a headache.
I wonder if https://github.com/NEAR-Edu/near-certification-tools/tree/feature/getExpirationCTE gets you back to where you were.
If so, can you open a draft PR from that branch, and continue working in that branch?
Thanks.
P.S. Other T3 members might have more capacity to help now (so you could ask them) since what you're working on is now the highest priority of the remaining issues: https://github.com/NEAR-Edu/near-certification-tools/issues
@eadsoy I haven't looked at your new commits yet, but thanks for pushing them!
One thing that would be helpful if you have time for it (and if we still think this complicated expiration feature is worthwhile) is to without code think through all the possible scenarios (which you already started to do). I.e. what various snapshots of a database (with specific data in different rows of different tables) would we need to have if we wanted to make sure that this function works as expected.
This would be a pain to do but might be worthwhile to give us confidence that our function isn't missing something.
Hi @ryancwalsh!
Sorry for not telling you to use merge instead of rebase. I made a video for you here: https://www.loom.com/share/2ec42b3d8f94486eb533b5ebd5930e53
That's my fault. I hope I didn't cause you too much of a headache.
No no, thank you very much but really, no worries. I think I should have figured it out so it's completely fine really.
And thank you for the video, I recorded a video as well to give you an update (there might be some children's voices towards the very end, I think there's a parade of kindergarteners outside, sorry 🙃 ) https://www.loom.com/share/331b67a640d0466a8fff245613304d9f
I removed the random number I had yesterday. I completely agree that was not a good way of approaching this.
I'd appreciate if you could have a look at the last version (f288870) and give your feedback.
Also, in the code I saw this comment of yours
let expiration = null; // The UI (see `generateImage`) will need to gracefully handle this case when indexer service is unavailable.
What would be the best way to handle this? What should be shown in that case? I added a line to the handler function to display a text but didn't commit it.
Lastly, I am returning an object (expiration
) from the getExpiration
function which has expirationDate
and isExpired
properties. I am not sure how you'd like to display the expiration message in case the certificate is expired so I thought this might be a good way to check whether the certificate is expired or not and adjust the text to be displayed accordingly.
https://github.com/NEAR-Edu/near-certification-tools/issues/17#issuecomment-1049903552 @ryancwalsh I just saw your comment, sorry.
Of course, will do.
@eadsoy Don't worry about "gracefully handle this case when indexer service is unavailable" for now; we'll handle that in a separate conversation.
I just watched your video. I haven't looked at the code yet or tried playing with it myself, and I still don't understand it. But I hope to dig into it later.
One thought / question I have from the video is that it my inclination was to not have the boolean isExpired
in this function (i.e. not return an object) since wherever this function is being called we could just compare the returned expiration date to "now".
For some reason, that feels a bit cleaner to me (e.g. calling this function getExpiration
and having it return the simple expiration and then if / when we need to compare to "now", we do it outside this function).
I just watched your video. I haven't looked at the code yet or tried playing with it myself, and I still don't understand it. But I hope to dig into it later.
I'm sorry, that's probably because I should've done a better job at explaining. Would you like me to take another video or write a brief walk-through?
One thought / question I have from the video is that it my inclination was to not have the boolean isExpired in this function (i.e. not return an object) since wherever this function is being called we could just compare the returned expiration date to "now".
For some reason, that feels a bit cleaner to me (e.g. calling this function getExpiration and having it return the simple expiration and then if / when we need to compare to "now", we do it outside this function).
I agree, it looks cleaner and makes more sense.
I wasn't sure what we will be displaying if the certificate is expired, so I thought in case we want to show for example a red 'Expired' text + black date string or something, isExpired
could be checked to add 'Expired' in front of the date while adding text to the certificate.
Let me refactor the function and return only the date. The function is called in fetchCertificateDetails
; here I can check whether it's expired or not and then, should I return a text like this if it is expired? :
@eadsoy No need for another video yet, thanks. I just haven't had time to look at the code and think about this stuff.
As for what to show in the image, I was leaning towards asking Dan to remove the "Expiring:" label and allow us to generate it dynamically (and maybe we want that same freedom with all labels). Then for expired dates, the label would say "Expired:", and for non-expired ones, we could either say "Expiration: ", "Expires:", or "Expiring:", depending on what our team thinks sounds best.
Actually, I just remembered that I've wanted to change the layout even more than that. Because I think we should create room to add an explanation about how the expiration works.
I think I'd shared some thoughts in the Discord channel a while ago that I could go back and find.
As for what to show in the image, I was leaning towards asking Dan to remove the "Expiring:" label and allow us to generate it dynamically (and maybe we want that same freedom with all labels). Then for expired dates, the label would say "Expired:", and for non-expired ones, we could either say "Expiration: ", "Expires:", or "Expiring:", depending on what our team thinks sounds best.
I refactored getExpiration to return only expiration. I added a comparison to current date where we call getExpiration
in fetchCertificateDetails
. If certificate is expired, it adds 'Expired: ' and 'Expires: ' if still valid to the beginning of expiration string (for now). See commit: 4107799
Actually, I just remembered that I've wanted to change the layout even more than that. Because I think we should create room to add an explanation about how the expiration works.
I think I'd shared some thoughts in the Discord channel a while ago that I could go back and find.
These? https://discord.com/channels/828768337978195969/906115083250307103/940951434537627698 https://discord.com/channels/828768337978195969/906115083250307103/940981535677505577
@eadsoy You are working late! 😮
Thanks for those links.
Ok, hold off from working on that branch until I push the commits I'm working on. We just had some conflicts, so I'm going to revert your most recent commit and then push a couple of mine.
And I think there has been some confusion about the definition of the expiration that we'll need to figure out.
@eadsoy At https://github.com/NEAR-Edu/near-certification-tools/pull/27/commits/4734bd643cc26358e4151eed9fd04e3b67ecdc19 I removed the "Expiring:" label from the SVG background files.
Then in https://github.com/NEAR-Edu/near-certification-tools/pull/27/commits/5f11b9f7aa54d015447107ac1ecc60425609b672 I removed all remaining traces of the expiration concept... to see what it would be like without it (at least temporarily, in case we decide that we should go ahead and deploy to production without an expiration feature holding us back).
If we do decide to re-add an expiration, we'd start by:
getRawQuery
and getExpiration
For 1, I think our definition is:
If at least one period of >180 consecutive days of inactivity after issueDate exists, expiration = the beginning of the first such period + 180 days. Otherwise, expiration = most recent activity + 180 days.
"activity" (and its opposite) are vague and need clarification.
Currently we've been assuming that "activity" = the join query below (using receipts and action_receipts), which might be fine.
SELECT TO_TIMESTAMP(R."included_in_block_timestamp"/1000000000) as moment
FROM PUBLIC.RECEIPTS R
LEFT OUTER JOIN PUBLIC.ACTION_RECEIPTS AR ON R.RECEIPT_ID = AR.RECEIPT_ID
WHERE SIGNER_ACCOUNT_ID = ${accountName}
AND R."included_in_block_timestamp" > ${issuedAtUnixNano}
The expiration definition is more complicated than I first thought about, though; as you pointed out earlier, we still need to consider Scenario 2
If you have other stuff to work on instead, feel free. Because points 2-8 might be more work than is worthwhile for these certificates in the short run.
I'd really like to give this a little bit more time (and go through the steps you mentioned in https://github.com/NEAR-Edu/near-certification-tools/issues/17#issuecomment-1050298947) before deciding to ditch the idea completely (in my humble opinion of course)
If we do decide to re-add an expiration, we'd start by:
coming up with a clear definition ...
For 1, I think our definition is:
If at least one period of >180 consecutive days of inactivity after issueDate exists, expiration = the beginning of the first such period + 180 days. Otherwise, expiration = most recent activity + 180 days.
"activity" (and its opposite) are vague and need clarification.
Currently we've been assuming that "activity" = the join query below (using receipts and action_receipts), which might be fine. ...
reverting that 5f11b9f commit
What 'activity' really is could maybe become clearer after thinking about all the possible scenarios and our needs you mentioned in https://github.com/NEAR-Edu/near-certification-tools/issues/17#issuecomment-1049903552. I gave this a good thought already but definitely it makes sense to step back and think about it without thinking of the code.
About reverting to the 5f11b9f commit in case we want to keep showing the expiration date, and the confusion about what expiration is:
I see you changed the order of the if statement we had in getExpiration
(in 5f11b9f) but really it should be in the order it was before
I created some diagrams again and I just want to go through what we have (had) and what I was thinking. Because I think we're on the same page about what expiration is, but the part I mentioned above creates some difference of opinion I guess.
I will reference the red portion as Query - 1 and the other one as Query - 2
Query - 1 returned a result.
It returns moment
, diff_to_previous
, diff_from_last_activity_to_render_date
as NULL, and has_long_period_of_inactivity
as TRUE
.
-moment
is the END DATE of such period.
diff_to_previous
is how many times past since the activity beefore that moment. We don't know this date. We only know how many days prior to moment it happened.
diff_from_last_activity_to_render_date
is NULL because this information is redundant in this query, this is used if the second query runs. It's set to null to only match the column numbers for both queries.
has_long_period_of_inactivity
refers to this query. It's TRUE if Query - 1 returned a result.
moment is the end date, so that is why we need to find (diff_to_previous - 180). Later on we need to subtract this difference from moment
to get the exact expiration date.
In the code, subtraction happens if has_long_period_of_inactivity
is TRUE; meaning we're looking at the result of Query - 1.
Query - 2 runs.
It returns moment
, diff_to_previous
as NULL, diff_from_last_activity_to_render_date
, and has_long_period_of_inactivity
as FALSE
.
-moment
is LAST ACTIVITY of the account.
diff_to_previous
is NULL because this information is redundant in this query, this is used if the second query runs. It's set to null to only match the column numbers for both queries. diff_from_last_activity_to_render_date
is days between render date (now) and last activityhas_long_period_of_inactivity
is TRUE if Query - 1 returned a result, FALSE if Query - 2 returned a result.
With this result at hand we check whether diff_from_last_activity_to_render_date is over 180 or not.
A. If not, the certificate hasn't expired yet and expiration day is last activity + 180 in the FUTURE
Are we on the same page about what expiration is? Am I missing something? I really would like to make sure before going further.
@eadsoy Ok, you can proceed with step 2 (generating all the test rows we need). I regret not having us do this earlier. We would have been able to bypass all of this conversation because it would have been clearer and provable.
Another thing we'll need to look into is my TODO comment at https://github.com/NEAR-Edu/near-certification-tools/blob/4ea20f4d19897ec8dd912ac5d927a088baa6d6ff/web-app/pages/api/cert/%5BimageFileName%5D.ts#L113
Today I started with thinking about https://github.com/NEAR-Edu/near-certification-tools/issues/17#issuecomment-1049903552
For Step 2 https://github.com/NEAR-Edu/near-certification-tools/issues/17#issuecomment-1051326125
Am I on the right track with these resources or should I think of something else?
@eadsoy I hadn't thought Docker would be necessary and so was surprised when you mentioned it, but yeah I see it mentioned here https://www.prisma.io/docs/guides/testing/integration-testing so yeah maybe you need that.
I haven't heard of Synth, but if that's easier (or maybe it's necessary in addition), go for it.
Only thing is, I thought of a way to use only one query that only checks last activity but i figured this would require a background job, which you eliminated from the start so this is unimportant.
I am curious about this. Everything is a trade-off. So I always want to hear ideas that I hadn't considered before. Maybe cron jobs would be a good idea.
Answering the first part related to testing https://github.com/NEAR-Edu/near-certification-tools/issues/17#issuecomment-1054579969https://github.com/NEAR-Edu/near-certification-tools/issues/17#issuecomment-1054579969 :
I created a branch off of develop called feature/prisma-test-setup
and a draft PR to basically setup tests as a separate feature.
I followed the Prisma Integration Testing tutorial and it refers to the Unit testing setup at some points but not all of them are used in Integration testing actually. Makes it harder to follow really. Steps I followed:
dotenv
to add the Test DB URL to .env.test
. {
...
"docker:up": "docker-compose up -d",
"docker:down": "docker-compose down",
"test": "yarn docker:up && yarn prisma migrate deploy && dotenv -e .env.test jest -i",
"migrate:postgres": "dotenv -e .env.test -- npx prisma migrate dev --name postgres-init"
}
It says that account_changes_new_id_seq
doesn't exist, why wouldn't it though? I will look into this.
That's were I'm at right now.
The flow of operations in the Prisma docs is as follows which makes sense and what I'm trying to accomplish:
- Start the container and create the database
- Migrate the schema
- Run the tests
- Destroy the container Each test suite will seed the database before all the test are run. After all the tests in the suite have finished, the data from all the tables will be dropped and the connection terminated.
I'm at the point where I'm thinking to migrate from a simpler schema that represents tables we need and to integrate Synth to seed the db with data.
I am curious about this. Everything is a trade-off. So I always want to hear ideas that I hadn't considered before. Maybe cron jobs would be a good idea.
This is what I was thinking but we would need to save certified account names with their expiration dates, and probably a field called isExpired
(set to false
as default) in a db.
This one would also require a field called nextCheck
or something on the db, that shows the date of the next cron job for the account.
Issue Date + 180
. The next cron job is scheduled to run on nextCheck
which is set to Issue Date + 180
for this account = this account will be detected by the cron job on nextCheck
. nextCheck
,
last activity + 180
. Account's expiration date is updated to this new date.
The next cron job for this account (nextCheck
) is set to last activity + 180
= this account will be detected by the cron job on nextCheck
. This goes on until any inactivity is detected in future cron jobs.Pro: The cron job doesn't have to check all accounts each day, it will only go through the ones which are assigned for that day. Con: If the certificate is rendered between cron jobs, it won't show the actual last activity + 180 but whatever expiration date the most recent cron job that checked this account's activity returned.
Issue Date + 180
. isExpired
field for this account is saved as false
.isExpired
is false, and their last activity dates. If any account's last activity extended 180 days, the account's isExpired
field is updated with true
. Otherwise, if last activity has changed since previous day, the Expiration date will always be calculated and updated as last activity + 180
Pro: The expiration date is always up-to-date.
Con: Cron job needs to go through all certified accounts where isExpired
isn't true.
@eadsoy Regarding https://github.com/NEAR-Edu/near-certification-tools/issues/17#issuecomment-1055515773, I can't look at this today (and don't have experience with what you're doing anyway), but it seems like you're learning and making progress. Let me know tomorrow whether you get past what is blocking you there.
Regarding https://github.com/NEAR-Edu/near-certification-tools/issues/17#issuecomment-1055582452, in response to both Method 1 and Method 2, I have one tip. Almost always, it's best to minimize what data we save to a database. The main reason isn't to minimize storage size but instead to minimize confusion and risk of missing a step of updating something.
In Method 1, it seems to me that nextCheck field doesn't need to exist because its value would always be the same as expirationDate, right? And in Method 2, isExpired wouldn't need to be a field because the frontend would always be able to calculate whether something is expired based on comparing "now" and lastActivity (and looking at whatever diff threshold we defined).
Updating one column's value based on another column's value usually is unnecessary.
What I like about Method 1 (or a variation of it) is that our cron job (which we might set to run daily or hourly or every minute) would then via a SQL query narrow down to a subset of certified accounts that it would need to iterate over. But the drawback is that the published expiration date on the certificate images then would often get stale because in reality certain students (hopefully many) would have had activity on mainnet, and their expirations should appear as further out in the future, but the loop within the cron job wouldn't haven bothered to look at them yet and update it.
The downside of Method 2 is that we'd be keeping every non-expired certificate in the loop, and if we have tens of thousands of certificates (hopefully!) I wonder if it will 1) consume expensive resources and take a long time and 2) maybe not even be relevant most of the time because 99% of those certificates never get viewed.
So I'm still leaning away from a cron job.
Regarding https://github.com/NEAR-Edu/near-certification-tools/issues/17#issuecomment-1055515773, I can't look at this today (and don't have experience with what you're doing anyway), but it seems like you're learning and making progress. Let me know tomorrow whether you get past what is blocking you there.
I was able to set up the environment for tests. The current Prisma schema that reflects the indexer db, has multiple foreign keys on receipts and action_receipts, which makes migrating with test data to our test db very hard. I have a question at this point: This schema is all we need to test our getExpiration and getRawQuery functions right?
I manually changed the Prisma schema to only leave these two tables(receipts and action_receipts), without the foreign key connections and migrated it to the test db. The query seems to run successfully against the test db (getRawQuery) on an account that has just one activity (getExpiration throws an error in the tests but I haven't looked into it yet and have removed it for now)
Tomorrow I'm hoping to come back having solved these:
getRawQuery
and getExpiration
functions in a separate file for the tests. I should import them into the test file from their original location which is /web-app/pages/api/cert/[imageFileName].tsIn Method 1, it seems to me that nextCheck field doesn't need to exist because its value would always be the same as expirationDate, right? And in Method 2, isExpired wouldn't need to be a field because the frontend would always be able to calculate whether something is expired based on comparing "now" and lastActivity (and looking at whatever diff threshold we defined).
You are absolutely right. I was thinking expirationDate
would have a string type, while nextCheck
would have a date type but I made it too complicated. Also, you're again right, Method 2 wouldn't need an isExpired
field.
What I like about Method 1 (or a variation of it) is that our cron job (which we might set to run daily or hourly or every minute) would then via a SQL query narrow down to a subset of certified accounts that it would need to iterate over. But the drawback is that the published expiration date on the certificate images then would often get stale because in reality certain students (hopefully many) would have had activity on mainnet, and their expirations should appear as further out in the future, but the loop within the cron job wouldn't haven bothered to look at them yet and update it.
Exactly. That was the con I was thinking about.
The downside of Method 2 is that we'd be keeping every non-expired certificate in the loop, and if we have tens of thousands of certificates (hopefully!) I wonder if it will 1) consume expensive resources and take a long time and 2) maybe not even be relevant most of the time because 99% of those certificates never get viewed.
Agreed completely!
So I'm still leaning away from a cron job.
Yes I completely understand why it wouldn't be as efficient. Thank you for these detailed thoughts though! I'm taking notes from your answers, this was a great brainstorm for me.
@eadsoy In other systems in the past, I've needed to run a command that temporarily disables all foreign key checks and then run another command to re-enable all foreign key checks. I wonder if that would be appropriate and possible here.
If so, you'd be able to use the original real schema, right? You'd just temporarily disables all foreign key checks before the insertion of the seed data and then re-enable after?
Or are there foreign keys from the main tables that we care about to other tables that we just don't care about? If so, yeah maybe we need to artificially create a similar schema but that doesn't lead us to extraneous work managing irrelevant FK relationships.
---I'm not working late right now ---
@eadsoy In other systems in the past, I've needed to run a command that temporarily disables all foreign key checks and then run another command to re-enable all foreign key checks. I wonder if that would be appropriate and possible here.
That's exactly what we need! I will look into this before I go and try to create another schema. Thank you.
If so, you'd be able to use the original real schema, right? You'd just temporarily disables all foreign key checks before the insertion of the seed data and then re-enable after?
Yes, exactly.
Or are there foreign keys from the main tables that we care about to other tables that we just don't care about? If so, yeah maybe we need to artificially create a similar schema but that doesn't lead us to extraneous work managing irrelevant FK relationships.
No no, we don't care about these foreign keys at all. So your suggestion looks like the solution if it's possible to implement. I will work on this tomorrow. Thanks!
But this didn't really work out. It didn't isolate the fields or tables while migrating to the test db.
Also wherever nextval('something'::regclass)
was used in the schema (e.g. dbgenerated("nextval('account_changes_new_id_seq'::regclass)")
) caused errors repeatedly which I still don't understand.
I might give it another shot but instead I created a test schema and in the test script I'm referring to this schema while migrating to the test db.
This approach worked out fine.
Then I changed the test script to seed the generated data into the test db after migration. The script works but I'm getting this error:
1: One or more database inserts failed: error returned from database: invalid sign in external "numeric" value: invalid sign in external "numeric" value
I have an idea why this might be happening but I need to come back with a clear mind.
The getExpiration
and getRawQuery
functions are still copied into a separate file for the tests. To say they are copied is actually wrong as well, the line here causes problems for some reason in the getExpiration function and I just worked with a workaround that returned the correct data and tests ran successfully but this definitely needs to change.
const result: RawQueryResult = await prisma.$queryRaw<RawQueryResult>`${rawQuery}`;
I will also check this tomorrow
const issuedAtUnixNano = dayjs(issuedAt).unix() * 1_000_000_000; // TODO: See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number/MAX_SAFE_INTEGER and add a comment here and in JSDoc for functions that have a `number` argument for dates. Explain why it's safe to use floating point `number` type (if it is), or switch to a better approach and explain it.
Tomorrow I will :
getExpiration
and getRawQuery
.@eadsoy Thanks for the update!
It's helpful to have random data, sure. But for certain tests (the ones I mention in step 2 of https://github.com/NEAR-Edu/near-certification-tools/issues/17#issuecomment-1050298947) you'll also want to very intentionally come up with a plan to prove that our function can handle every relevant scenario.
As a former coworker of mine liked to say "if it's untested, it doesn't work", meaning: "Until you've written a good test for something, it officially doesn't work." We won't always want to dedicate effort to writing tests, but definitely for this case we do want to because time/calendar-based stuff is so tricky.
Forcing ourselves to write tests actually also forces us to writer cleaner, more testable functions in the first place (which is one reason I like requiring functions to be so few lines).
It's helpful to have random data, sure. But for certain tests (the ones I mention in step 2 of https://github.com/NEAR-Edu/near-certification-tools/issues/17#issuecomment-1050298947) you'll also want to very intentionally come up with a plan to prove that our function can handle every relevant scenario.
To check if the query returns the expected result and whether it returns the expected expiration date, I seeded the db from the test itself. Synth has an option to use a data source but while it's relatively easy to generate a fixed username, generating data with an array of fixed numbers (for the included_in_blocktimestamp
) is harder. They have an option to generate data from a data source but it didn't work out for some reason. This was not a blocker really since I was able to generate the data using another way, but I will look into this. I would like to seed the db using Synth in the end.
As a former coworker of mine liked to say "if it's untested, it doesn't work", meaning: "Until you've written a good test for something, it officially doesn't work." We won't always want to dedicate effort to writing tests, but definitely for this case we do want to because time/calendar-based stuff is so tricky.
Forcing ourselves to write tests actually also forces us to writer cleaner, more testable functions in the first place (which is one reason I like requiring functions to be so few lines).
Thank you for sharing these thoughts. I will be very happy if we have good tests we can rely on in the end of this.
I'd like to ask you a couple questions:
getExpiration
and getRawQuery
directly into the test file from /pages/api/cert/[imageFileName].ts
(I am using the last version you refactored before removing these) but what happened is that it ran the whole file line by line. I mean it also tried to run the getNftContract
for instance. At this point all information pointed to mocking these functions to isolate them but I'm not getting why I should do that in this case.
https://jestjs.io/docs/mock-function-apiI obviously am missing something here and will have to figure this out but for now I'm keeping these functions in separate files(which I don't like), and importing them into tests from there. I have also added a query function to only retrieve the query result and I'm using it to check if the query returns the correct result.
At this moment the tests at hand are working fine but refactoring is much needed and also the above points need to be figured out.
Also the existing tests need to be more descriptive:
@eadsoy Thanks for the update.
If we keep the getRawQuery
function and it remains as simple as it currently is, I'd be fine with leaving it untested since it's super basic string replacement (as long as you remember to explore https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number/MAX_SAFE_INTEGER as mentioned at https://github.com/NEAR-Edu/near-certification-tools/blob/4ea20f4d19897ec8dd912ac5d927a088baa6d6ff/web-app/pages/api/cert/%5BimageFileName%5D.ts#L113).
getExpiration (at least as of this commit https://github.com/NEAR-Edu/near-certification-tools/blob/4ea20f4d19897ec8dd912ac5d927a088baa6d6ff/web-app/pages/api/cert/%5BimageFileName%5D.ts#L100 ) isn't exported yet but probably needs to be, right? I don't have much experience with writing tests for NodeJS but I'd assume the test might not have access to the function unless it's exported.
Yes, the names/descriptions/hints about the tests should reveal how you think about each scenario.
If we keep the getRawQuery function and it remains as simple as it currently is, I'd be fine with leaving it untested since it's super basic string replacement
I'm not testing this function actually, I had to separate it since the getExpiration
function is calling it. Again, importing it from its original path didn't work.
And yes I have checking https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number/MAX_SAFE_INTEGER on my list, but didn't have the time today. It's one of my priorities, I am going to check this.
getExpiration isn't exported yet but probably needs to be, right? I don't have much experience with writing tests for NodeJS but I'd assume the test might not have access to the function unless it's exported.
I added export
to both, and the import in the test file succeeds as well since it's suggesting the correct arguments to use but it just doesn't do what I'm expecting. I am a bit overworked right now and I'm sorry for not providing the sources I looked through to have a healthier discussion. Let me look into it again on monday and if I still face this problem, I will ask it more precisely, would that be ok?
@eadsoy Of course! You never need to feel pressure to respond after work hours.
@ryancwalsh I'm currently trying to write an explanation for: https://github.com/NEAR-Edu/near-certification-tools/blob/4ea20f4d19897ec8dd912ac5d927a088baa6d6ff/web-app/pages/api/cert/%5BimageFileName%5D.ts#L113).
But before doing so I wanted to make sure I understand what kind of explanation we want there. These are my findings:
So, are we seeing the alternative as using BigInt instead of number here? In that case BigInt seems to be way slower in performance compared to the number type and f64 would be already sufficient for our use case.
Computing with a float64 is cheap, you get sub-microsecond precision nowadays, you don’t need to pre-coordinate about milliseconds versus microseconds versus (sencond,nanosecond) pairs et cetera et cetera, as long as you’re not counting individual nanoseconds you should be great.
So it seems logical to use f64.
I'm not sure what exactly to write in the comment. From what angle should I approach it?
I will write the other updates I have on another comment now, thank you.
Additional comment: I think I contradicted myself in the previous comment, we're surpassing the MAX_SAFE_INTEGER when converting to nanoseconds. Which also is mentioned here: https://randomascii.wordpress.com/2020/09/27/floating-point-in-the-browser-part-1-impossible-expectations/
So it will round to the closest double in our case right? Which makes it unsafe.
I will write my updates and come back to this. I'm sorry I got confused for a second going through all the resources.
@eadsoy I see that near-api-js uses bn.js: https://github.com/near/near-api-js/blob/e92725352a5ad33db7ba97f4d181da44af88267f/package.json#L14-L27
So maybe look into that. https://github.com/indutny/bn.js
@eadsoy By the way, whenever I'm throwing stuff at you that is not clear to me how to do, that's a great sign. :-) I appreciate you figuring things out for us.
Changes made in https://github.com/NEAR-Edu/near-certification-tools/commit/15522d3585a09b9b1680c248398f3694e7201add
I changed the column data type back to Decimal
for the included_in_blocktimestamp
column in web-app/prisma/test-schema.prisma
as it is also Decimal
in the original schema. I had changed it to BigInt
previously because the test db was throwing an error while inserting data via Synth for the inluded_in_block_timestamp
column. I thought I was probably doing something wrong but turns out I should have trusted my instincts and look more into the error. Lesson learned. It looks like this is a known issue with Synth. The numeric types in postgres create an error while inserting into the DB using Synth. I'm pasting a screenshot to not reference it here which I learned the hard way today:
Instead Synth, I used Prisma’s own seed method to populate the test db. Here, what was challenging to figure out at first was that we use prisma generate dev
to migrate the test-db schema while running the test script for the first time and this also runs the seed file if present in the prisma directory.
But after tests finish I clear out the db, and when the test script is run again, prisma migrate dev
won’t run the seed file again because it only runs when it detects a new migration. That’s why I added the --skip-seed
flag in the migrate:postgres
script and I’m running the prisma db seed
manually in the test
script after the migration. It works fine right now:
So I amended my commit thinking removing the link from the commit message would delete that reference and pushed again. This indeed changed the commit message and there's no link anymore but the original commit is still there detached from the commit history: https://github.com/NEAR-Edu/near-certification-tools/commit/760259c2aed9406304996beec42fcf8a50bd23be . So this didn’t delete the reference in the Synth github issue. I am afraid to directly reference to any github issue now so I'l be pasting the issue link provided in a Stack Overlofw comment, here it seems that people are unhappy about this as well: https://stackoverflow.com/questions/43872026/remove-issue-reference-in-github#:~:text=There is an,at 6%3A33
The solution seems to be like contacting github about it and ask for removal. I am not sure if sharing the repo there is a privacy problem for us since the repo is already public but even so this wasn’t my intention. Would you like me to contact github support?
I’m thinking of removing Synth completely and only use the seed data if you don’t see any use case for random data generation, which would again fail with the current schema which has Decimal type for included_in_block_timestamp
. We would need to change it back to BigInt if we want to keep using Synth.
After solving the floating point number type issue for calculating timestamps, only thing left will be thinking more about other test cases if there are any. I’m thinking I’m probably missing other scenarios there. The tests are basically checking for only two scenarios and that is all.
@eadsoy I see that near-api-js uses bn.js: https://github.com/near/near-api-js/blob/e92725352a5ad33db7ba97f4d181da44af88267f/package.json#L14-L27
So maybe look into that. https://github.com/indutny/bn.js
I will do that, thank you!
@eadsoy By the way, whenever I'm throwing stuff at you that is not clear to me how to do, that's a great sign. :-) I appreciate you figuring things out for us.
Thank you so much! I really am appreciating getting thrown issues at and, I am not taking it for granted:) I might get lost too deep in details and not see what’s in front of me sometimes as I just did while looking into the floating point type but I love the learning experience. Thank you for your patience.
Another update I forgot mentioning:
getRawQuery
and getExpiration
in a separate file in /web-app/helpers/expiration-date.ts
because exporting these functions and importing them in the test file was still running the whole file ( /web-app/pages/api/cert/[imageFileName].ts
) from the top, and therefore throwing an error. I hope having them isolated in the helpers folder is ok, I thought they fit well there? Some links I found on what might be causing this: https://stackoverflow.com/a/60198745/10684149 https://stackoverflow.com/a/37325716/10684149 https://stackoverflow.com/a/61935995/10684149
Re: https://stackoverflow.com/questions/43872026/remove-issue-reference-in-github#:~:text=There%20is%20an,at%206%3A33 That's interesting and unfortunate that GitHub doesn't let us clean up mishaps like that. Not a big deal though.
What are the .json.bak
files for?
Upon image rendering, the system should check for all activity of the mainnet account since the certificate issue date.
If the indexer is unavailable:
The cert could say simply "Expires: after the first 6-month period of inactivity of this mainnet account"
If the indexer is available (which is the typical case):
The cert calculates the expiration and if expired simply says
Expired: 2021-12-03
but if still active saysExpires: after the first 6-month period of inactivity of this mainnet account (2022-08-06 if no future activity)
But in the short run, since Dan hasn't had time to update designs, we could write an
*
and then insert a new line underneath that starts with*
and then explains.See https://discord.com/channels/828768337978195969/906115083250307103/940981535677505577 and discussion before and after https://discord.com/channels/828768337978195969/906115083250307103/943511742229667901
My next challenge (which you are welcome to explore, but might feel unfamiliar and harder) is how to change getExpiration in https://github.com/NEAR-Edu/near-certification-tools/blob/2d77d6af8594bfe0d456a9aa9c4c5a91efe382d9/web-app/pages/api/cert/%5BimageFileName%5D.ts#L82 to using Prisma. Prisma probably allows "cheating" and using raw Postresql queries. Getting that working would be a first step.
Then ideally later we'd do it in an abstract way using Prisma's style without a raw query (which would give us flexibility in case the indexer ever decides to use a DB other than Postgres).
You can see a sample Prisma query in getMostRecentActivityDateTime, which is a function that probably needs to be deleted since or at least reconceived since we now have a different definition of how we want certificates to expire.