Closed thecodingwizard closed 9 months ago
How would full automation work? Would we have to continuously ping the USACO website to check for new problems/contests?
For now, a "get problems and update files" script would probably suffice? That way after every contest we only need one person to run a script and update everything.
Is access to the Algolia database public? It's separate from this repo right?
Access to algolia is not public, and is indeed separate from the repository.
However, the format in which we store algolia items is public: https://github.com/cpinitiative/usaco-guide/blob/62f87bb4bcecb2b67722dbc49a443465b1f68544/gatsby-config.ts#L115 It's just the API key that is private.
So if you are interested in working on this, I believe you should be able to test locally by making your own Algolia account and using your own API key!
On a semi-related note, is the local algolia search run during yarn develop
synced with the one used by usaco.guide?
When i search balancing a tree in problems search and click "view solution" on the actual usaco.guide website it redirects correctly to the internal sol, but it doesnt work on my local clone (it redirects to the old usaco external sol instead).
"View Solution" links to different pages on usaco.guide vs on a local build
What's more peculiar is that the "View Solution" link works for most other internal sols locally, just not Balancing a Tree; maybe because the internal sol for this problem was created relatively recently?
Good observation! I don't think these are synced. I believe there is a "development" version of Algolia (so it's easier to test changes locally without screwing it up in production).
Changes are actually synced when the project is built and deployed (though sometimes this is a little finicky if I recall correctly)
Hm, what do you mean by "changes are synced"? Because when I run it locally even on the master branch the Algolia tables still don't match
There are two indexes in algolia, prod_problems
and dev_problems
. The actual website runs on prod_problems
. Local development uses the dev_problems
index (I think). This is probably why running locally yields different Algolia tables than production.
To develop locally, the easiest way is probably to make your own Algolia account and change the API key / index name to use your own Algolia setup. Then you can test changes / modify your own index without interfering with the production website.
When a PR gets merged / Vercel builds the site, the gatsby algolia plugin runs and updates the prod_problems
index. That's what I meant by "changes are synced" -- if you modify the algolia schema / want to change an algolia index using the gatsby algolia plugin, the changes are only deployed to the production algolia index when the site is built on Vercel.
Then when is dev_problems
updated?
If you have the API keys to CPI's algolia account, you can update dev_problems
locally. I forget when exactly it is updated to be honest. I think dev_problems
was used a lot when the problems search was being developed, but now it is not really used anymore since there isn't development on problems search.
side note: is there any specific reason why problem search is still in beta?
To develop locally, the easiest way is probably to make your own Algolia account and change the API key / index name to use your own Algolia setup. Then you can test changes / modify your own index without interfering with the production website.
will look into it ππ»
I think there were a lot of other features we were thinking of adding to the problems search page, but we just never got around to itβ¦
However, the format in which we store algolia items is public:
It's just the API key that is private.
Interesting, so when we run the website locally without any additional configuration, where are these environment variables (ALGOLIA_APP_ID/ALGOLIA_API_KEY) coming from?
Also, I've created a .env file with the following content:
ALGOLIA_APP_ID=XXXXXX
ALGOLIA_API_KEY=XXXXXXXXX (Admin API Key)
and within the algolia project I've made an index called dev_problems
, but I don't think its changed anything. Is there another step I missed?
update 1: forgot to run yarn build
earlier but am now running into the following error:
Error: Record at the position 86 objectID=intro-ds is too big size=15157/10000 bytes. Please have a look at https://
www.algolia.com/doc/guides/sending-and-managing-data/prepare-your-data/in-depth/index-and-records-size-and-usage-lim
itations/#record-size-limits
update 2: after setting ALGOLIA_INDEX_NAME to dev and NODE_ENV to development, I now somehow get a new error π
Error: Error loading a result for the page query in "/problems/ccc-firehose/solution". Query was not run and no cach
ed result was found.
update 3: error from update 2 has been fixed in #4084 ππ» still running into this error though
Error: Record at the position 86 objectID=intro-ds is too big size=15157/10000 bytes. Please have a look at https://
www.algolia.com/doc/guides/sending-and-managing-data/prepare-your-data/in-depth/index-and-records-size-and-usage-lim
itations/#record-size-limits
do i need a paid plan to store the full database?
I think there were a lot of other features we were thinking of adding to the problems search page, but we just never got around to itβ¦
Is working on them still a possibility :)
Oh yikes... we do have a paid plan which is probably why we didn't run into this issue before. But I'm not sure why the object is so big -- maybe we're storing some information we don't actually need to store in that object?
Is working on them still a possibility :)
I personally do not have plans to work on them, but if you happen to have time and want to, that would be much appreciated!!
Okay, after looking into it, I think it's because the module object in algolia contains the full text content of the module, which is very big for intro DS. Two solutions: either clip the content length to 9k characters (less ideal), or improve the way we extract text from the article (better). For example, quiz questions don't need to be extracted, code does not need to be extracted, etc.
Another approach would be to make every section in every module its own object which might improve search, but this would be harder.
To be honest, we should probably figure out a more ideal way to implement search; right now, the search quality isn't very good I think. The prod_modules
index just powers the search for modules functionality on the website.
If you need access to our paid Algolia account and you're a CPI team member, let me know -- we may be able to set something up for you.
If you need access to our paid Algolia account and you're a CPI team member, let me know -- we may be able to set something up for you.
I am a team member but it's ok if that's too much of a hassle to set up π
To be honest, we should probably figure out a more ideal way to implement search; right now, the search quality isn't very good I think.
Is the metadata in the search results (id:
, title:
, etc.) intentional? it makes the search results look a bit less professional imo
Two solutions: either clip the content length to 9k characters (less ideal), or improve the way we extract text from the article (better).
Would definitely be nice to look into, although this probably isn't a priority since I think not many people plan on running their own local algolia clone anyway π
I personally do not have plans to work on them, but if you happen to have time and want to, that would be much appreciated!!
I might; is there a list of some of these planned features?
Is the metadata in the search results (id:, title:, etc.) intentional? it makes the search results look a bit less professional imo
no! it would be nice if we got rid of it.
Would definitely be nice to look into, although this probably isn't a priority since I think not many people plan on running their own local algolia clone anyway π
I think improving the way we extract text (ex. by getting rid of the metadata in your screenshot) would help with production search as well, not just local development.
I might; is there a list of some of these planned features?
If you need access to our paid Algolia account and you're a CPI team member, let me know -- we may be able to set something up for you.
I am a team member but it's ok if that's too much of a hassle to set up π
also, is this still a possibility?
Also, I'm trying to use my own Algolia Client like so:
import algoliasearch from 'algoliasearch/lite';
export const searchClient = algoliasearch(
process.env.ALGOLIA_APP_ID ?? '3CFULMFIDW',
process.env.ALGOLIA_API_KEY ?? 'b1b046e97b39abe6c905e0ad1df08d9e'
);
(I'm using the ??
so it still works for people who don't have the env variables set)
It works when I just directly do:
export const searchClient = algoliasearch(
'my_app_id',
'my_api_key'
);
However, the first snippet still defaults to the old values; are the env variables somehow unintialized when searchClient
is initialized?
also, is this still a possibility?
yes, can you dm me on Discord? @thecodingwizard
are the env variables somehow unintialized when searchClient is initialized?
How are you setting the environment variables? I think if you put them in an .env
file in the root directory it should work, but I could be wrong.
I do have them in a .env
file, but its still not working? I tried putting require('dotenv').config()
at the top of algoliaSearchClient.ts
but I got this error:
BREAKING CHANGE: webpack < 5 used to include polyfills for node.js core modules by default.
This is no longer the case. Verify if you need this module and configure a polyfill for it.
Hm, sorry, I'm not actually sure what's wrong then...
Update: it somehow works now...
https://github.com/cpinitiative/usaco-guide/commit/08bd022e1657d2260cfd613098a87a4949056414 in #4086
Although, the RefinementList no longer loads for me locally; is it dependent on the *_modules
index?
You may have needed to restart yarn dev
? Not sure.
I think you're correct that RefinementList is dependent on some Algolia configuration (my guess is *_problems
). I attached exports of our configuration for *_modules
and *_problems
here
Hm, I suspect it's *_modules
because I don't have the modules index on my own copy (due to reasons mentioned earlier of the content being too large) and that's likely why the Refinement List doesn't load? (and also the Refinement List categories are basically just the module names)
you can make a modules index, then populate it by running gatsby build I think!
you can make a modules index, then populate it by running gatsby build I think!
Yeah, although, as I don't have a paid account, I unfortunately run into this issue:
Okay, after looking into it, I think it's because the module object in algolia contains the full text content of the module, which is very big for intro DS. Two solutions: either clip the content length to 9k characters (less ideal), or improve the way we extract text from the article (better). For example, quiz questions don't need to be extracted, code does not need to be extracted, etc.
I would just clip the length for local development purposes for nowβ¦
edit: nvm, didn't notice the separate div_to_probs
file
In https://github.com/cpinitiative/usaco-guide/blob/master/src/components/markdown/ProblemsList/DivisionList/DivisionList.tsx, the code for the monthlies table, the following code appears:
const data = useStaticQuery(graphql`
query {
allProblemInfo(
filter: { source: { in: ["Bronze", "Silver", "Gold", "Plat"] } }
) {
edges {
node {
solution {
kind
label
labelTooltip
sketch
url
hasHints
}
uniqueId
url
tags
difficulty
module {
frontmatter {
id
}
}
}
}
}
}
`);
When I run this query in graphiQL, problems such as Equal Sum Subarray, which isn't linked to a module, don't appear in the results (however, Piling Papers does, despite being more recent, because it's linked to a module). However, Equal Sum Subarray does show up in the monthlies table itself, which doesn't make a whole lot of sense to me; isn't all the content in the monthlies table extracted from this graphql query?
progress: in order to get usaco problems to show up in problems search, we just have to add them to extraProblems.json
, so I wrote a script to do that:
https://github.com/devo1ution/usaco-guide/blob/algolia/usaco_util.mjs
It prompts for the problem id and generates the corresponding json by querying the problem page and adds it to extraProblems.json. updated code to use JSON.stringify()
puts all the array elements (tags) on different lines, but this gets fixed by pre-commit :)prettier
so this isn't even necessary anymore
TODO:
divToProbs
idToSol
@thecodingwizard not quite sure how this works, but if I queried the usaco website once for every possible problem id (~1500 times) to keep the table up to date would that overload the server π
LOL maybe we can figure out some more efficient solution. Perhaps we can let the user specify which monthly contests they would like to scrape, or we can intelligently scrape from the most recent contest to the oldest contest, stopping whenever we encounter a problem that we have already seen before.
If you're querying solely based off problem ID, maybe you can assume that they are chronologically increasing? I'm not entirely sure...
Thanks for all your work here! This will be super helpful for the upcoming contest :)
Just wrote a script that added all old problems to extraProblems.json
if they weren't already there or in a module:
https://github.com/cpinitiative/usaco-guide/pull/4086/commits/7f4baecbdf39585f401eafed8a9ce9354cd8c8a0
(although the difficulties will need to be manually tweaked)
i also pushed these changes to dev_problems in case you want to mess around with them
also edited usaco_util.mjs
to prompt for difficulty too
Although for whatever reason my code fails tsc status check now and I'm not sure why?
also minor change: I renamed ALGOLIA_APP_ID
to GATSBY_ALGOLIA_APP_ID
so it can be accessed in algoliaSearchClient
; although I have provided default values in algoliaSearchClient
so it shouldn't make too much of a difference.
(although the difficulties will need to be manually tweaked)
I wonder if it's possible to create a new difficulty value of "Unknown". might need to tweak a lot of the UI rendering stuff too though..
@thecodingwizard not quite sure how this works, but if I queried the usaco website once for every possible problem id (~1500 times) to keep the table up to date would that overload the server π
i mean just query the last couple? you can set a floor and query up from that every so often
@thecodingwizard side note: would it be possible to have pull request branches push to dev_problems
instead of prod_problems
and also use dev_problems
? That way algolia updates are easier to preview
i mean just query the last couple? you can set a floor and query up from that every so often
yeah I think we can keep the latest problem id as a repo secret and then we can set up timed workflows like for each season (December/Jan/Feb/March 20th) so we can just incrementally update automatically
I wonder if it's possible to create a new difficulty value of "Unknown". might need to tweak a lot of the UI rendering stuff too though..
alr I added a new N/A difficulty class that shows a tooltip when you hover over it: This problem was added automatically; if you want to suggest a difficulty, feel free to make a pull request!
Also, I refactored the difficulty box (the little thing that says Easy/Hard/Insane) into a separate file but am now running into this warning:
warn chunk commons [mini-css-extract-plugin]
Conflicting order. Following module has been added:
* css ./node_modules/gatsby/node_modules/css-loader/dist/cjs.js??ruleSet[1].rules[9].oneOf[1].use[1]!./node_modules/postcs
s-loader/dist/cjs.js??ruleSet[1].rules[9].oneOf[1].use[2]!./node_modules/tippy.js/themes/material.css
despite it was not able to fulfill desired ordering with these modules:
* css ./node_modules/gatsby/node_modules/css-loader/dist/cjs.js??ruleSet[1].rules[9].oneOf[1].use[1]!./node_modules/postcs
s-loader/dist/cjs.js??ruleSet[1].rules[9].oneOf[1].use[2]!./node_modules/tippy.js/themes/light.css
- couldn't fulfill desired order of chunk group(s) component---src-pages-problems-tsx,
component---src-pages-problems-tsxhead
- while fulfilling desired order of chunk group(s) component---src-pages-dashboard-tsx,
component---src-templates-module-template-tsx, component---src-templates-solution-template-tsx,
Any idea how to resolve it?
edit: nvm, just had to alphabetically reorder the imports
side note: would it be possible to have pull request branches push to dev_problems instead of prod_problems and also use dev_problems? That way algolia updates are easier to preview
It seems like this should be possible, but I am not sure how. I think in Vercel you can set environment variables dependent on whether it is production or preview, so perhaps we can add an environment variable that specifies the algolia prefix to use. (I think we might already have an environment variable named ALGOLIA_INDEX_NAME
)
It seems like this should be possible, but I am not sure how. I think in Vercel you can set environment variables dependent on whether it is production or preview, so perhaps we can add an environment variable that specifies the algolia prefix to use. (I think we might already have an environment variable named
ALGOLIA_INDEX_NAME
)
Could you try settingALGOLIA_INDEX_NAME
to dev
? Idt i have access to the vercel π
Done! Vercel needs to re-build before the changes will take effect (I triggered a manual rebuild for your Algolia PR).
Hm, it seems like the most recent vercel build is still using prod_problems: https://usaco-guide-ci2palbvi-cpinitiative.vercel.app/problems/ (e.g. try searching "fertilizing pastures"; it's present in dev_problems but doesn't show up here) Or am I misunderstanding what rebuild means π edit: oops, turns out the index doesn't depend on ALGOLIA_INDEX_NAME! In problems.tsx:
const indexName =
process.env.NODE_ENV === 'production' ? 'prod_problems' : 'dev_problems';
can I change this to depend on ALGOLIA_INDEX_NAME instead? e.g.
const indexName = `${process.env.ALGOLIA_INDEX_NAME}_problems`;
edit 2: we also have to rename ALGOLIA_INDEX_NAME
to GATSBY_ALGOLIA_INDEX_NAME
so components can actually access it π
can I change this to depend on ALGOLIA_INDEX_NAME instead? e.g.
Yes, that makes sense! Though, maybe as a fallback (ie. if process.env.ALGOLIA_INDEX_NAME
is not defined), default to what we had previously?
we also have to rename ALGOLIA_INDEX_NAME to GATSBY_ALGOLIA_INDEX_NAME so components can actually access it
oops sorry, why is this the case? (like why do we need the GATSBY_ prefix / what was the reasoning for renaming ALGOLIA_INDEX_NAME to GATSBY_ALGOLIA_INDEX_NAME?)
I have the index defaulted to dev_problems for now, but I can change that to prod if you want!
As for the env variables, tβs a bit obscured in the docs, but env variables without the Gatsby prefix can only be accessed in gatsby-config.ts (which is why my env variables werenβt working properly earlier I think).
On Fri, Dec 22, 2023 at 12:40 Nathan Wang @.***> wrote:
can I change this to depend on ALGOLIA_INDEX_NAME instead? e.g.
Yes, that makes sense! Though, maybe as a fallback (ie. if process.env.ALGOLIA_INDEX_NAME is not defined), default to what we had previously?
we also have to rename ALGOLIA_INDEX_NAME to GATSBY_ALGOLIA_INDEX_NAME so components can actually access it
oops sorry, why is this the case? (like why do we need the GATSBY_ prefix?)
β Reply to this email directly, view it on GitHub https://github.com/cpinitiative/usaco-guide/issues/4027#issuecomment-1868055106, or unsubscribe https://github.com/notifications/unsubscribe-auth/APTPJCR4X6CNWCNHUJ4XMGDYKXV2ZAVCNFSM6AAAAAA7HYSYVGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGA2TKMJQGY . You are receiving this because you commented.Message ID: @.***>
Oh wow, I had no idea. I added GATSBY_ALGOLIA_INDEX_NAME
to both prod and dev, and triggered a rebuild for your branch.
Defaulting to dev_problems
seems fine to me!
can the Dec 2023 problems be added?
Any time USACO releases new problems, the following things need to be done (potentially incomplete):
usacoProblems
search for USACO Guide IDE to include new problemsThey're done manually right now, but we should figure out how to do it automatically (or at least make it easier to do everything manually).