dreamproit / bill-similarity

Calculate similarity of bill documents using a variety of NLP approaches
1 stars 0 forks source link

Get more bills #3

Open dmytro-ustynov opened 2 years ago

dmytro-ustynov commented 2 years ago

In repository provided on start there were several files in xml format such as samples/congress/116/uslm/BILLS-116hconres9enr.xml etc. And there were a pasing script to get sections from each bill. But looks like that this script doesn't work with the other bills from the set we download via congress tool. So the main question is: How (where) can I get more, preferrably the whole set of bills that i can split to sections for further work? May be (that's just my suggestion) we should transform the parsing script so it would parse that set? Or there is some step of transformation that i still haven't found yet, isn't it?

Anyway the main point is to get more bills to get get more sections from them.

dmytro-ustynov commented 2 years ago

@aih , what do yo think?

aih commented 2 years ago

The parser works with uslm, which requires namespaces in the XPath for lxml. To use the downloaded files directly, comment that out and use XPath without namespaces. Let me know if you try that with a file and you have trouble.