glourencoffee / pycvm

Python library for processing data from CVM
MIT License
2 stars 0 forks source link

Error while reading DFP/2020: "got empty string at field 'COLUNA_DF'" #9

Closed glourencoffee closed 2 years ago

glourencoffee commented 2 years ago

Description

Reading the DFP of 2020 raises the exception MissingValueError with the following message: "got empty string at field 'COLUNA_DF'".

Steps to reproduce

  1. Download the DFP/2020 from CVM's Data Portal here
  2. Read DFPITR documents with dfpitr_reader()
  3. See error

Expected behavior

No error while reading DMPL statements.

Actual behavior

Exception is raised:

>>> import cvm
>>> reader = cvm.csvio.dfpitr_reader('dfp_cia_aberta_2020.zip')
>>> for doc in reader:
...    print(doc)
...
<RegularDocument: id=100120 type=Demonstrações Fiscais Padronizadas version=1 CNPJ=00.000.000/0001-91>
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "d:\projects\cvm\cvm\csvio\dfpitr.py", line 338, in _zip_reader
    con_statements = read_statements_mapped_by_type(con_readers)
  File "d:\projects\cvm\cvm\csvio\dfpitr.py", line 326, in read_statements_mapped_by_type
    statements = stmt_reader.read(head_batch_id)
  File "d:\projects\cvm\cvm\csvio\dfpitr.py", line 95, in read
    stmts = self.read_statements(batch)
  File "d:\projects\cvm\cvm\csvio\dfpitr.py", line 129, in read_statements
    statement = self.create_statement(rows)
  File "d:\projects\cvm\cvm\csvio\dfpitr.py", line 211, in create_statement
    column  = row.required('COLUNA_DF', str)
  File "d:\projects\cvm\cvm\csvio\row.py", line 24, in required
    raise MissingValueError(f"got empty string at field '{fieldname}'")
cvm.exceptions.MissingValueError: got empty string at field 'COLUNA_DF'
glourencoffee commented 2 years ago

Analyzing the single CSV file for the consolidated DMPL by company "Banco do Brasil" and comparing it against the same document on the CVM website, I noticed this problem is not caused by this library but rather by missing data in the DMPL file. Here are the steps to reproduce it:

  1. Head to the CVM webpage that shows the DFP/2020 by Banco do Brasil
  2. Select "DFs Consolidadas"
  3. Select "Demonstração das Mutações do Patrimônio Líquido"
  4. Select tab "01/01/2020 a 31/12/2020"
  5. Notice how there are exactly 9 columns for account quantities, namely: 5.1. Capital Social Integralizado 5.2. Reservas de Capital, Opções Outorgadas e Ações em Tesouraria 5.3. Reservas de Lucro 5.4. Ajustes de Avaliação Patrimonial 5.5. Lucros ou Prejuízos Acumulados 5.6. Outros Resultados Abrangentes 5.7. Patrimônio Líquido - Acionistas Controladores 5.8. Patrimônio Líquido - Acionistas Não Controladores 5.9. Total do Patrimônio Líquido Consolidado
  6. Now, extract the DFP/2020 ("dfp_cia_aberta_2020.zip") into a folder "2020"
  7. Under "2020", open the CSV file for the consolidated DMPL ("dfp_cia_aberta_DMPL_con_2020.csv") with a spreadsheet editor, such as LibreOffice Calc
  8. Select the first consecutive rows with value "Saldos Iniciais" at column "DS_CONTA"
  9. Copy these rows into a new spreadsheet
  10. In the new spreadsheet, remove rows with value "PENÚLTIMO" at column "ORDEM_EXERC"
  11. Remove all columns before "COLUNA_DF" for the sake of cleanness
  12. The resulting spreadsheet shall look like the following table:
COLUNA_DF CD_CONTA DS_CONTA VL_CONTA ST_CONTA_FIXA
Capital Social Integralizado 5.01 Saldos Iniciais 75100000 S
Reservas de Capital, Opções Outorgadas e Ações em Tesouraria 5.01 Saldos Iniciais 6253263 S
Reservas de Lucro 5.01 Saldos Iniciais 53814656 S
Lucros ou Prejuízos Acumulados 5.01 Saldos Iniciais 0 S
Outros Resultados Abrangentes 5.01 Saldos Iniciais -4046637 S
Patrimônio Líquido 5.01 Saldos Iniciais -23386544 S
Participação dos Não Controladores 5.01 Saldos Iniciais 107734738 S
Patrimônio Líquido Consolidado 5.01 Saldos Iniciais 2236641 S
  5.01 Saldos Iniciais 109971379 S

Transposing that table to make it similar to that of CVM's website results in the following:

Código Descrição Capital Social Integralizado Reservas de Capital, Opções Outorgadas e Ações em Tesouraria Reservas de Lucro Lucros ou Prejuízos Acumulados Outros Resultados Abrangentes Patrimônio Líquido Participação dos Não Controladores Patrimônio Líquido Consolidado  
5.01 Saldos Iniciais 75100000 6253263 53814656 0 -4046637 -23386544 107734738 2236641 109971379

Notice how there is no column "Ajustes de Avaliação Patrimonial", a column which does exist in the CVM's website. Also notice how the name of the last column is empty.

If a column "Ajustes de Avaliação Patrimonial" was inserted after "Reservas de Lucro" and the name of the next columns were shifted, the resulting table would match the one in CVM's website:

Código Descrição Capital Social Integralizado Reservas de Capital, Opções Outorgadas e Ações em Tesouraria Reservas de Lucro Ajustes de Avaliação Patrimonial Lucros ou Prejuízos Acumulados Outros Resultados Abrangentes Patrimônio Líquido Participação dos Não Controladores Patrimônio Líquido Consolidado
5.01 Saldos Iniciais 75100000 6253263 53814656 0 -4046637 -23386544 107734738 2236641 109971379

The conclusion is that this bug is not caused by this library by rather by an incorrect generation of CSV files on CVM's side. One workaround this library may do is insert a column "Ajustes de Avaliação Patrimonial" when reading DMPL of DFP/2020.

glourencoffee commented 2 years ago

This bug is also confirmed in DFP/2021 for both consolidated and individual balances. Here are the steps to get the individual balances of Banco do Brasil for 2021:

  1. Download the financial statements of 2021 by Banco do Brasil given on its IR website

  2. Jump to page 9

  3. See two tables, the first for "BB Banco Múltiplo" (individual balances) and the second for "BB Consolidado" (consolidated balances)

  4. See the following table with individual balances: bb-2021-dmpl-individual

  5. Head to the CVM's webpage that shows BB's statements for the year of 2021

  6. Select "DFs Individuais"

  7. Select "Demonstração das Mutações do Patrimônio Líquido"

  8. Select tab "01/01/2021 a 31/12/2021"

  9. See the following table: bb-2021-dmpl-individual-cvm

Notice how data from both sources match. The only unmatching columns are "Reservas de Capital" (on BB's file) and "Reservas de Capital, Opções Outorgadas e Ações em Tesouraria" (on CVM's webpage), but this is because they are not entirely equivalent. If we sum up the values of "Reservas de Capital" and "Ações em Tesouraria" (on BB's file), that gives the same number as that of CVM's column:

1.397.697 + (-279.187) = 1.118.510

Now, download the DFP/2021 at CVM's Data Portal, extract it, and open the CSV file for the individual DMPL ("dfp_cia_aberta_DMPL_ind_2021.csv"), just like it was done in the previous comment. The data contained therein shall result in the following transposed table:

Código Descrição Capital Social Integralizado Reservas de Capital, Opções Outorgadas e Ações em Tesouraria Reservas de Lucro Lucros ou Prejuízos Acumulados Outros Resultados Abrangentes Patrimônio Líquido Participação dos Não Controladores
5.01 Saldos Iniciais 90000023 1118510 39454038 2040 0 -13851389 116723222

Again, inserting the column "Ajustes de Avaliação Patrimonial" and shifting the columns to the right solves this problem. It seems the column "Ajustes de Avaliação Patrimonial" was introduced in 2020 for DMPL statements, but generation of DFP/ITR files hasn't accounted for it properly.

glourencoffee commented 2 years ago

I have cleaned up CVM's ass by fixing this, but ideally, quantities in DMPL files should be assigned to the correct columns. It is possible that DMPL files get fixed in the future, in which case a new issue may be created to undo the changes implemented by this issue.