Open almonds0166 opened 2 months ago
Grammar for detecting events (corresponds to HAS_EVENT_PREDICATE_FUNCTION
schema):
const HAS_EVENT_PREDICATE_GRAMMAR = dedent`
boolean ::= ("true" | "false") space
char ::= [^"\\\\\\x7F\\x00-\\x1F] | [\\\\] (["\\\\bfnrt] | "u" [0-9a-fA-F]{4})
has-event-kv ::= "\\"has_event\\"" space ":" space boolean
has-event-rest ::= ( "," space rejected-reason-kv )?
rejected-reason-kv ::= "\\"rejected_reason\\"" space ":" space string
root ::= "{" space (has-event-kv has-event-rest | rejected-reason-kv )? "}" space
space ::= | " " | "\\n" [ \\t]{0,20}
string ::= "\\"" char* "\\"" space`
Grammar for extracting the events (corresponds to EXTRACT_FUNCTION
schema):
const EXTRACT_GRAMMAR = dedent`
char ::= [^"\\\\\\x7F\\x00-\\x1F] | [\\\\] (["\\\\bfnrt] | "u" [0-9a-fA-F]{4})
events ::= "[" space (events-item ("," space events-item)*)? "]" space
events-item ::= "{" space (events-item-title-kv events-item-title-rest | events-item-time-in-the-day-kv events-item-time-in-the-day-rest | events-item-date-time-kv events-item-date-time-rest | events-item-duration-kv events-item-duration-rest | events-item-location-kv events-item-location-rest | events-item-organizer-kv )? "}" space
events-item-date-time-kv ::= "\\"date_time\\"" space ":" space string
events-item-date-time-rest ::= ( "," space events-item-duration-kv )? events-item-duration-rest
events-item-duration-kv ::= "\\"duration\\"" space ":" space integer
events-item-duration-rest ::= ( "," space events-item-location-kv )? events-item-location-rest
events-item-location-kv ::= "\\"location\\"" space ":" space string
events-item-location-rest ::= ( "," space events-item-organizer-kv )?
events-item-organizer-kv ::= "\\"organizer\\"" space ":" space string
events-item-time-in-the-day-kv ::= "\\"time_in_the_day\\"" space ":" space string
events-item-time-in-the-day-rest ::= ( "," space events-item-date-time-kv )? events-item-date-time-rest
events-item-title-kv ::= "\\"title\\"" space ":" space string
events-item-title-rest ::= ( "," space events-item-time-in-the-day-kv )? events-item-time-in-the-day-rest
events-kv ::= "\\"events\\"" space ":" space events
integer ::= ("-"? integral-part) space
integral-part ::= [0] | [1-9] [0-9]{0,15}
rejected-reason-kv ::= "\\"rejected_reason\\"" space ":" space string
rejected-reason-rest ::= ( "," space events-kv )?
root ::= "{" space (rejected-reason-kv rejected-reason-rest | events-kv )? "}" space
space ::= | " " | "\\n" [ \\t]{0,20}
string ::= "\\"" char* "\\"" space`
Talking with the SIPB LLMs endpoints is the same process as it has been before (see talk.py). So for example, we could have:
async function doCompletion(prompt: string, grammar: string): Promise<string> {
try {
const response = await fetch(SIPB_LLMS_API_ENDPOINT, {
method: "POST",
headers: {
"Authorization": `Bearer ${SIPB_LLMS_API_TOKEN}`,
"Content-Type": `application/json`,
},
body: JSON.stringify({
"messages": [
{"role": "user", "content": prompt},
],
"stream": false,
"tokenize": true,
"stop": ["</s>", "### User Message", "### Assistant", "### Prompt"],
"cache_prompt": false,
"frequency_penalty": 0,
"grammar": grammar,
"image_data": [],
//"model": "mixtral",
"min_p": 0.05,
"mirostat": 0,
"mirostat_eta": 0.1,
"mirostat_tau": 5,
//"n_predict": 1000,
"n_probs": 0,
"presence_penalty": 0,
"repeat_last_n": 256,
"repeat_penalty": 1.18,
"seed": -1,
"slot_id": -1,
"temperature": 0.7,
"tfs_z": 1,
"top_k": 40,
"top_p": 0.95,
"typical_p": 1,
}),
});
if (!response.ok)
throw new Error(`HTTP error: ${response.status}`);
const data = await response.json();
return data["choices"][0]["message"]["content"];
} catch (error) {
console.error(`Error with completion:`, error);
throw error;
}
}
Those stop tokens may need workshopping. The relevant file is llm/emailToEvents.ts
.
A difficulty with migrating to SIPB LLMs is that there is not yet an easy way to develop DormSoup locally, which may deserve it's own GitHub Issue. For starters, the most convenient approach may be to adapt the testEmailToEventsPrompt.ts
script to read from a folder of plaintext emails (e.g., https://github.mit.edu/sipb/dormdigest-emails) instead of connecting to the inbox.
Some progress on the sipb-llms
branch: c2aa4d038ec43530cabe221640ce836e2cb9eb59
Also e9d84cc3ebf340c991c38c417207ddd7fc28c526
Environment variables need to be added to .env
: SIPB_LLMS_API_ENDPOINT
, SIPB_LLMS_API_TOKEN
MIT emails and dormspam event data is classified as medium risk information, and sending this information to proprietary services is a privacy concern at the least and violation of MIT policies (e.g., 11.0, 13.2) at worse. While dormspam is a large, decentralized email ecosystem that already lives on Microsoft services (Exchange/Outlook), privacy concerns remain valid.
To address these concerns, this open issue represents our transfer from ChatGPT to SIPB LLMs.
The process involves rewriting the backend's use of the OpenAI API to use the Fetch API at SIPB LLMs endpoints instead, e.g. our Mixtral model. JSON schema structuring is instead accomplished by GBNF grammars, already written.