Regex discards important text

The scraper miss-detects stuff and discards important data. For instance, scraping Cedom's bill 400 yields the following result:

{
"sancion": "01/06/2000",
"publicacion": "BOCBA N� 989 del 21/07/2000",
"promulgacion": "De Hecho del 03/07/2000",
"_id": {
  "$oid": "520a8ec68be1e20000000002"
},
"articulos": [
  {
    "articulo": "</b> Proh&iacute;bese a los establecimientos educativos ",
    "_id": {
      "$oid": "520a8ec68be1e20000000008"
    }
  },
  {
    "articulo": "</b> Ning&uacute;n alumno, con motivo de mora en el ",
    "_id": {
      "$oid": "520a8ec68be1e20000000007"
    }
  },
  {
    "articulo": " </b>los alumnos de los establecimientos citados ",
    "_id": {
      "$oid": "520a8ec68be1e20000000006"
    }
  },
  {
    "articulo": "</b> De verse configurados los extremos descriptos en ",
    "_id": {
      "$oid": "520a8ec68be1e20000000005"
    }
  },
  {
    "articulo": "</b> La Secretar&iacute;a de Educaci&oacute;n podr&aacute; ",
    "_id": {
      "$oid": "520a8ec68be1e20000000004"
    }
  },
  {
    "articulo": "</b> Comun&iacute;quese, etc</P>",
    "_id": {
      "$oid": "520a8ec68be1e20000000003"
    }
  }
],
"__v": 0
}

As you can see, the articles' text are not quite complete.

On other cases, like when part of the text contains double quotes (e.g.: "Some text"), all of the article's text up to that section is also discarded.

As a general rule, ALL text between two articles' titles should be included as part of the article.

DemocracyOS / bill-scraper

Regex discards important text #1