SciCrunch / sparc-curation

code and files for SPARC curation workflows
MIT License
14 stars 12 forks source link

dataset template workflow cannot preserve rich text cells #119

Open tgbugs opened 1 month ago

tgbugs commented 1 month ago

It is not possible to use a combination of openpyxl and libreoffice calc and preserve inline formatting for cells. This is because openpyxl has dropped support for xl/sharedStrings.xml and localc always saves xlsx files using xl/sharedStrings.xml even if the inline formatting was originally saved in xl/worksheets/sheet1.xml. This currently impacts code_description.xlsx which has lost some of its formatting.

There is nothing to do about this right now other than point the projects at it and point out that it is currently impossible to roundtrip data when using both of their tools.

tgbugs commented 1 month ago

The test code needed in datasets.Tabular._openpyxl_fixes to illustrate the issue:

            if self.path.stem == 'code_description':
                import openpyxl.cell.rich_text as rt
                import openpyxl.cell.text as tx
                b7 = _sheet.cell(7, 2)
                def bold(text, cell=b7):
                    kwargs = {**cell.font.__dict__}
                    kwargs['rFont'] = kwargs.pop('name')
                    kwargs['b'] = True
                    return rt.TextBlock(tx.InlineFont(**kwargs), text)

                def italic(text, cell=b7):
                    kwargs = {**cell.font.__dict__}
                    kwargs['rFont'] = kwargs.pop('name')
                    kwargs['i'] = True
                    return rt.TextBlock(tx.InlineFont(**kwargs), text)

                t = rt.CellRichText('Column type. Valid values are: ', bold('Link'), ', ', bold('Text'), ', ', bold('Rating'), ', ', bold('Target'), ', ', bold('Target Justification'))
                b7.value = t
                #breakpoint()