alphanome-ai / sec-ai

A comprehensive open-source toolkit for AI-powered analysis and interpretation of SEC EDGAR filings, providing valuable insights for investors, fintech developers, and researchers.
https://sec.alphanome.app
MIT License
112 stars 24 forks source link

Make HighlightedTextClassifier work with `<b>` tags #55

Closed Elijas closed 11 months ago

Elijas commented 1 year ago

Example document

https://www.sec.gov/Archives/edgar/data/1675149/000119312518236766/d828236d10q.htm

image
 <p style="margin-top:9pt; margin-bottom:0pt; text-indent:4%; font-size:10pt; font-family:Times New Roman">
  Options to purchase 1 million shares of common stock at a weighted average exercise price of $36.28 were
outstanding as of June 30, 2017, but were not included in the computation of diluted EPS because they were anti-dilutive, as the exercise prices of the options were greater than the average market price of Alcoa Corporation’s common stock.
 </p>
 <p style="margin-top:13pt; margin-bottom:0pt; font-size:10pt; font-family:Times New Roman">
  <b>
   G. Accumulated Other Comprehensive Loss
  </b>
 </p>
 <p style="margin-top:6pt; margin-bottom:0pt; text-indent:4%; font-size:10pt; font-family:Times New Roman">
  The following table details the activity of the three components that comprise Accumulated other comprehensive loss for both Alcoa
Corporation’s shareholders and Noncontrolling interest:
 </p>

Goal

The "G. Accumulated Other Comprehensive Loss" should be recognized as HighlightedTextElement (and therefore, TitleElement).

Most likely, you will have to get a percentage of text that is covered inside the <b> tag, by reusing the parts implemented in the HighlightedTextElement. This will help you avoid situations where text text text <b>bold</b> text text is recognized as higlighted

HarikaB11 commented 1 year ago

Hi @Elijas , I would like to work on this issue

Elijas commented 1 year ago

Awesome! 🚀

If you have any questions or if we could support you in any way, in addition to using this GitHub issue, feel free to join our #sec-parser channel to get community support 🙌

lchauha commented 1 year ago

@Elijas I would like to work on this issue

Elijas commented 1 year ago

@HarikaB11 and @lchauha could you please introduce yourselves at the #sec-parser channel so the efforts be coordinated?

Just a simple "Hi, I'm Harika/Lchauha from GitHub" and we'll go from there 👍

Elijas commented 1 year ago

@HarikaB11 and @lchauha Uncoordinated (i.e. the first good Pull Requests gets accepted and the Issue gets closed) is also ok if we as a community agree to it.

But that may be discouraging to contributors who may have started working on a task that gets closed, so I'd recommend participating in the channel 🚀

HarikaB11 commented 1 year ago

@lchauha , I already started working on it and will submit PR by tomorrow. Please look into other issues.

INF800 commented 1 year ago

@HarikaB11 @lchauha In case you haven't noticed, there is a weekly community meeting scheduled today. Feel free to join and catch up with fellow developers:

Link to meeting message on Discord

We'll be having a meeting to discuss the current state and near-term direction of Alphanome.AI projects, and answer questions from the community. Join in!

Time: 2023-11-28 4:30-5:30PM IST

Google Meet joining info Video call link: https://meet.google.com/rbj-dnew-dny Or dial: https://tel.meet/rbj-dnew-dny?pin=1700196255356

Elijas commented 1 year ago

Hey @HarikaB11 and @lchauha

Can you share a little about the intended action plan to solve the issue? I would love to collaborate together on the solutions 🙌

Thanks!

Elijas commented 11 months ago

Hey, let's sync up on discord if you'd still like to contribute to the task - as we're putting this next on the roadmap to be worked on by the internal team 🚀