MIT-AI-Accelerator / c3po-model-server

Other
1 stars 1 forks source link

filter ACARS messages prior to DB upload (block in, takeoff, landing) #224

Open emiliecowen opened 1 month ago

emiliecowen commented 1 month ago

@lshinggg to provide list

emiliecowen commented 1 month ago

[11:40 AM] Shing, Leslie - 0552 - MITLL For issue #224 https://github.com/orgs/MIT-AI-Accelerator/projects/3/views/3?pane=issue&itemId=71426201 -- I uploaded my ACARS addition in a separate notebook on the topic modeling transition repo (I mentioned adding ACARS for topic modeling to Chase, but do not plan on sending the notebook as I'm not sure how they are planning to access that data source)

https://llcad-github.llan.ll.mit.edu/AF-Alexa/TransitionPackage_TopicSummarization/blob/main/(CUI)%20mitll_transition_topicsum_plus_ACARS.ipynb -- 5 code blocks down "Get ACARS Messages" , my filtering is the line if len(jobj['text']) == 0 or "from aircraft's ACARS system" in jobj['text']: continue [11:41 AM] Cowen, Emilie - 0552 - MITLL oh okay, so you're letting the block out messages with tail number info through [11:41 AM] Cowen, Emilie - 0552 - MITLL think we might still want stop words for that [11:42 AM] Shing, Leslie - 0552 - MITLL yah I was finding those messages with the tail numbers too noisy since most looked like automated messages [11:42 AM] Cowen, Emilie - 0552 - MITLL yes I see them too [11:42 AM] Shing, Leslie - 0552 - MITLL for my experiments I was more interested in seeing what others were free text typing [11:43 AM] Cowen, Emilie - 0552 - MITLL do we want these block outs in db? I think yes because it will be helpful for query retrieval [11:43 AM] Shing, Leslie - 0552 - MITLL side question: do you know what "block out" means? [11:43 AM] Cowen, Emilie - 0552 - MITLL I think you're actually dropping them because these data appear in "fields" [11:44 AM] Cowen, Emilie - 0552 - MITLL https://www.airliners.net/forum/viewtopic.php?t=457389 [11:44 AM] Cowen, Emilie - 0552 - MITLL it probably correlates closely with takeoff [11:44 AM] Cowen, Emilie - 0552 - MITLL so query retrieval can use it to answer the has my aircraft taken off question [11:45 AM] Cowen, Emilie - 0552 - MITLL since we're filtering out acars takeoff by default [11:45 AM] Shing, Leslie - 0552 - MITLL ah gotcha. so you get the exact timing of the departure. that's true. yes it would be important for query retrieval then. but I think still too noisy for topic modeling [11:45 AM] Cowen, Emilie - 0552 - MITLL this is getting gross then [11:45 AM] Cowen, Emilie - 0552 - MITLL but it is what it is [11:46 AM] Cowen, Emilie - 0552 - MITLL I'll have to think about how to implement the ppg filter [11:46 AM] Shing, Leslie - 0552 - MITLL yah... I'm not sure the best solution to approach this. [11:46 AM] Cowen, Emilie - 0552 - MITLL I think letting those through and adding stopwords would work, but that is more research for you [11:48 AM] Cowen, Emilie - 0552 - MITLL you know what - how about I'll let them through (anything with content in "fields" even if "text" is null), and continue to run on all non-mission channels with "acars block" in stop words and let you know what happens [11:48 AM] Shing, Leslie - 0552 - MITLL that sounds good