Closed Thomas-Lemoine closed 1 year ago
I fetched a bunch of arbital articles to check that it worked correctly and it seems fine, but occasionally there is stuff like:
The reasoning for an instrumental convergence claim says that for many utility functions $U_k$ and situations $S_i$ a $U_k$-consequentialist in situation $S_i$ will probably find some best policy $\pi_k = \underset{\pi_i \in \Pi}{\operatorname{argmax}} \mathbb E [U_k | S_i, \pi_i ](https://arbital.com/p/)$ that happens to be inside the partition $X$. If instead in situation $S_k$...
Where some (https://arbital.com/p/) comes out of nowhere
my guess is that this would also be a problem for the current dataset though, since I don't see how that behaviour would have changed with my new code. basically formulas that use brackets will confuse the parser
That's mathjax code. It should be fine. Might even be worth adding mathjax to the chatbot and seeing if it can generate pretty equations?
That's mathjax code. It should be fine. Might even be worth adding mathjax to the chatbot and seeing if it can generate pretty equations?
You might be misunderstanding what I meant when I quoted that result. somewhere in that answer, which corresponds to this I think: https://arbital.com/p/instrumental_convergence/ there's a usage of mathjax code, but our parsers sees it as brackets and treats it like a link ([123
One thing I'm considering is that if parse_arbital_link
tries to create a link taht is empty (ie (https://arbital.com/p/)
), we just ignore that, close the bracket as though it were a real bracket, and go on our merry way.
Except, it seems like there are two cases that idk how we could distinguish: Either there's math stuff in brackets, in which case we want to keep the brackets as is and show the contents of the brackets, OR there's non-finished links, ie stuff like [pseudoconsequentialist], where someone links to a page not yet created in the hopes that it gets created in the future and can be automatically replaced by a real link. In those cases, we probably want to remove the brackets. Does that make sense?
For some more flavorful examples of this method of using Bayes' rule, see [https://www.gwern.net/docs/statistics/1994-falk The ups and downs of the hope function in a fruitless search].
becomes
For some more flavorful examples of this method of using Bayes' rule, see [The ups and downs of the hope function in a fruitless search](https://arbital.com/p/https://www.gwern.net/docs/statistics/1994-falk).
handled an additional edge case for parse_arbital_link
I'll make another PR that changes entries given to make_data_entry; I guess it'll be a summaries
: List[str] then? it feels like it used to be that though, so do you remember why you switched it back? I suppose each entry being given a "summary" seems a bit more intuitive than giving it a list of many summaries, but also if we allow for an article to have many summaries, we might as well let them all be created from the same entry.
Most things that have summaries (e.g. arxiv or the alignment newsletter) only have a single summary, so it was easier to do it that way. Which of course breaks here :D
That makes sense. I suppose I can just replace "summary" key with "summaries" and the string with [string], but yeah that makes it a bit less obvious to use. alternatively, the "summary" key might take either List[str] | str, and then it checks if it's an instance of str or of list and decides accordingly, and that's maybe most flexible but maybe also more confusing or error prone? Not sure. Or, two keys, one "summary" and one "summaries", the summary one has str and summaries has List[str]; we assume only one is given; and in make_data_entry
, the "summary" string is turned into a List[str], [string]
, appended to "summaries" or wtv, and then it's more flexible but has two different keys depending on how you want to pass in the summary, either as a list of strings or a singular string
I'd go with checking for both keys, and even going so far as to join them if both are provided
yeah, makes sense. Will add that shortly
modified a few functions, especially markdownify_text, so that summaries are saved as summaries rather than as part of the text