SneaksAndData / arcane-framework

Akka.NET-based framework for data streaming services using the Arcane Kubernetes Operator
Apache License 2.0
5 stars 2 forks source link

Decompose schema validation from sources and sinks. #13

Closed s-vitaliy closed 7 months ago

s-vitaliy commented 7 months ago

Part of https://github.com/SneaksAndData/arcane-framework/issues/5

Scope

The pull request introduces the ability to use any Akka.NET-based source as a source for an Arcane stream. In an effort to decouple schema validation logic from the source and sink, thereby making the code more modular and facilitating the use of sources from third-party libraries, two wrapper interfaces have been introduced: ISchemaBoundSource and ISchemaFreeSource, along with their respective sinks.

Additionally, the introduction of these types allows the compiler to statically check that the source in the Arcane Stream uses the same schema as the sink does.

Additionally, this pull request introduces the ISchemaValidator interface. This can be used to validate that the data flow matches the schema between any stages, such as the output of ArcaneSource or the input of ArcaneSink

Allowed sink and source connection types

flowchart LR
    schemaFreeSource["`ISchemaFreeSource`"]
    schemaFreeSink["`ISchemaFreeSink`"]
    schemaBoundSource["`ISchemaBoundSource`"]
    schemaBoundSink["`ISchemaBoundSink`"]

    schemaFreeSource--> schemaFreeSink
    schemaBoundSource--> schemaFreeSink
    schemaBoundSource--> schemaBoundSink

Allowed conversions between schema bound sinks and sources

flowchart LR
    schemaFreeSource["`ISchemaFreeSource`"]
    schemaBoundSource["`ISchemaBoundSource`"]
    schemaBoundSink["`ISchemaBoundSink`"]
    schemaFreeSink["`ISchemaFreeSink`"]
    mapNode[`Map`]

    schemaBoundSource --> mapNode --> schemaFreeSource & schemaBoundSource & schemaBoundSink & schemaFreeSink
flowchart LR
    schemaFreeSource["`ISchemaFreeSource`"]
    schemaBoundSource["`ISchemaBoundSource`"]
    schemaFreeSink["`ISchemaFreeSink`"]
    mapNode[`Map`]
    withSchema["`withSchema`"]

    schemaFreeSource-->schemaFreeSink
    schemaFreeSource--> mapNode --> schemaFreeSource
    schemaFreeSource --> withSchema--> schemaBoundSource

Checklist

github-actions[bot] commented 7 months ago

Coverage after merging schema-validation-refactoring into main will be

58.70%

Coverage Report
github-actions[bot] commented 7 months ago

Coverage after merging schema-validation-refactoring into main will be

FileStmtsBranchesFuncsLinesUncovered Lines
src/Metrics/Models
   SourceTags.cs0%100%0%0%13, 18, 23, 30–37
src/Services
   StreamRunnerService.cs0%0%0%0%30–40, 44–49, 53–54, 54, 54–56, 59–61
src/Services/Base
   IStreamConfigurationProvider.cs0%100%0%0%11, 16
src/Sinks
   SchemaBoundSink.cs0%100%0%0%19–23, 28, 31
   SchemaFreeSink.cs100%100%100%100%
src/Sinks/Extensions
   SchemaFreeSourceExtensions.cs0%100%0%0%21
src/Sinks/Json
   JsonSink.cs74.66%42.86%100%77.68%102–105, 149–151, 151, 151, 151, 151, 151, 154–156, 158–159, 161–162, 164, 166, 177, 177–180, 82, 82, 82, 82, 84–86, 96–99
   MultilineJsonSink.cs79.12%52%100%81.69%113, 113, 113, 113, 115–117, 127–130, 133–136, 206–208, 208, 208, 208, 208, 208, 211–213, 215–216, 218–219, 221, 223, 234, 234–238
src/Sinks/Parquet
   ParquetSink.cs83.26%64.10%100%85.88%114, 114, 114, 114, 116–118, 134–137, 161, 212–215, 252–254, 254, 254, 254, 254, 254, 257–259, 261–262, 264–265, 267, 269, 280, 280, 286–288, 45
   ParquetOperations.cs81.01%70.83%100%84.31%100–102, 102, 102, 102, 102, 105–107, 159–161, 62, 65, 69, 71–72, 89, 89, 89, 92–96, 96, 96, 98–99
src/Sinks/Parquet/Models
   DataCell.cs100%100%100%100%
src/Sources
   SchemaBoundSource.cs33.33%100%22.22%40%25, 28, 32, 39, 43–45, 49, 52
   SchemaFreeSource.cs64.71%100%66.67%63.64%28, 40–42
src/Sources/CdmChangeFeedSource
   CdmChangeFeedSource.cs0%0%0%0%114–115, 115, 115, 115, 115–118, 121, 121, 121, 121, 121–129, 133–135, 138–141, 143–144, 155, 159–163, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165–175, 178–179, 182, 185, 188–190, 190, 190–193, 195–198, 201–202, 202, 202, 206, 206, 206–209, 211–212, 214–215, 217, 220–223, 223, 223–224, 228–231, 233–236, 236, 236–238, 240, 240, 240–243, 247, 247, 247, 247, 247–252, 252, 252–257, 260–268, 270, 270, 270, 272–273, 275–276, 278–280, 283, 285–298, 301–348, 351–352, 352, 352, 352, 352–355, 357, 357, 357–359, 359, 359–363, 365–367, 369–371, 374–379, 379, 379, 379, 379–386, 386, 386–389, 391–395, 395, 395, 395, 395–400, 402–403, 406, 408–416, 46–64, 67, 72, 75, 79–81, 85–92
src/Sources/CdmChangeFeedSource/Extensions
   CsvOperations.cs90.22%97.22%33.33%88.68%54, 85–87, 95–97
   JsonDocumentOperations.cs0%100%0%0%18–24, 35–40, 49–53
   SimpleCdmAttributeExtensions.cs0%0%0%0%19–23, 23, 23, 23, 23–30, 30, 30, 30, 30–41, 41, 41–46, 49–53, 53, 53–55, 58–59
src/Sources/CdmChangeFeedSource/Models
   SimpleCdmAttribute.cs0%0%0%0%106–108, 108, 108–111, 111, 111–113, 116–118, 123–124, 124, 124–126, 129, 129, 129–131, 134, 134, 134–136, 139, 139, 139, 14, 140–141, 144, 144, 144, 144, 144, 144, 144, 144, 144, 144, 144–149, 15, 150, 153–156, 16–29, 35, 41, 47, 53, 59, 65, 70–71, 79–81, 90–91, 91, 91–93, 96–97
   SimpleCdmEntity.cs0%0%0%0%101, 101, 101–103, 106, 106, 106, 106, 106–109, 112–114, 18, 23, 28, 33–34, 42–56, 58, 60–61, 69–70, 72, 72, 72–75, 77, 79–80, 85–86, 86, 86–88, 91, 91, 91–93, 96, 96, 96–98
src/Sources/Extensions
   GraphStageLogicExtensions.cs100%100%100%100%
   SinkExtensions.cs100%100%100%100%
   SourceExtensions.cs8.33%0%50%5%44–45, 45, 45–62
   src/Metrics/Models
   SourceTags.cs0%100%0%0%13, 18, 23, 30–37
src/Services
   StreamRunnerService.cs0%0%0%0%30–40, 44–49, 53–54, 54, 54–56, 59–61
src/Services/Base
   IStreamConfigurationProvider.cs0%100%0%0%11, 16
src/Sinks
   SchemaBoundSink.cs100%100%100%100%
   SchemaFreeSink.cs100%100%100%100%
src/Sinks/Extensions
   SchemaFreeSourceExtensions.cs100%100%100%100%
src/Sinks/Json
   JsonSink.cs74.66%42.86%100%77.68%102–105, 149–151, 151, 151, 151, 151, 151, 154–156, 158–159, 161–162, 164, 166, 177, 177–180, 82, 82, 82, 82, 84–86, 96–99
   MultilineJsonSink.cs79.12%52%100%81.69%113, 113, 113, 113, 115–117, 127–130, 133–136, 206–208, 208, 208, 208, 208, 208, 211–213, 215–216, 218–219, 221, 223, 234, 234–238
src/Sinks/Parquet
   ParquetSink.cs83.26%64.10%100%85.88%114, 114, 114, 114, 116–118, 134–137, 161, 212–215, 252–254, 254, 254, 254, 254, 254, 257–259, 261–262, 264–265, 267, 269, 280, 280, 286–288, 45
   ParquetOperations.cs74.68%66.67%87.50%77.45%100–102, 102, 102, 102, 102, 105–107, 159–161, 61–62, 62, 62–65, 65, 65–67, 69, 71–72, 89, 89, 89, 92–96, 96, 96, 98–99
src/Sinks/Parquet/Models
   DataCell.cs100%100%100%100%
src/Sources
   SchemaBoundSource.cs50%100%33.33%60%25, 28, 32, 36, 39, 49
   SchemaFreeSource.cs88.24%100%83.33%90.91%28
src/Sources/CdmChangeFeedSource
   CdmChangeFeedSource.cs0%0%0%0%114–115, 115, 115, 115, 115–118, 121, 121, 121, 121, 121–129, 133–135, 138–141, 143–144, 155, 159–163, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165–175, 178–179, 182, 185, 188–190, 190, 190–193, 195–198, 201–202, 202, 202, 206, 206, 206–209, 211–212, 214–215, 217, 220–223, 223, 223–224, 228–231, 233–236, 236, 236–238, 240, 240, 240–243, 247, 247, 247, 247, 247–252, 252, 252–257, 260–268, 270, 270, 270, 272–273, 275–276, 278–280, 283, 285–298, 301–348, 351–352, 352, 352, 352, 352–355, 357, 357, 357–359, 359, 359–363, 365–367, 369–371, 374–379, 379, 379, 379, 379–386, 386, 386–389, 391–395, 395, 395, 395, 395–400, 402–403, 406, 408–416, 46–64, 67, 72, 75, 79–81, 85–92
src/Sources/CdmChangeFeedSource/Extensions
   CsvOperations.cs90.22%97.22%33.33%88.68%54, 85–87, 95–97
   JsonDocumentOperations.cs0%100%0%0%18–24, 35–40, 49–53
   SimpleCdmAttributeExtensions.cs0%0%0%0%19–23, 23, 23, 23, 23–30, 30, 30, 30, 30–41, 41, 41–46, 49–53, 53, 53–55, 58–59
src/Sources/CdmChangeFeedSource/Models
   SimpleCdmAttribute.cs0%0%0%0%106–108, 108, 108–111, 111, 111–113, 116–118, 123–124, 124, 124–126, 129, 129, 129–131, 134, 134, 134–136, 139, 139, 139, 14, 140–141, 144, 144, 144, 144, 144, 144, 144, 144, 144, 144, 144–149, 15, 150, 153–156, 16–29, 35, 41, 47, 53, 59, 65, 70–71, 79–81, 90–91, 91, 91–93, 96–97
   SimpleCdmEntity.cs0%0%0%0%101, 101, 101–103, 106, 106, 106, 106, 106–109, 112–114, 18, 23, 28, 33–34, 42–56, 58, 60–61, 69–70, 72, 72, 72–75, 77, 79–80, 85–86, 86, 86–88, 91, 91, 91–93, 96, 96, 96–98
src/Sources/Extensions
   GraphStageLogicExtensions.cs100%100%100%100%
   SinkExtensions.cs100%100%100%100%
   SourceExtensions.cs8.33%0%50%5%44–45, 45, 45–62
   SqlServerUtils.cs100%100%100%100%
   SchemaFreeSourceExtensions.cs100%100%100%100%
59.53%

Coverage Report
github-actions[bot] commented 7 months ago

Coverage after merging schema-validation-refactoring into main will be

FileStmtsBranchesFuncsLinesUncovered Lines
src/Metrics/Models
   SourceTags.cs0%100%0%0%13, 18, 23, 30–37
src/Services
   StreamRunnerService.cs0%0%0%0%30–40, 44–49, 53–54, 54, 54–56, 59–61
src/Services/Base
   IStreamConfigurationProvider.cs0%100%0%0%11, 16
src/Sinks
   SchemaBoundSink.cs100%100%100%100%
   SchemaFreeSink.cs100%100%100%100%
src/Sinks/Extensions
   SchemaFreeSourceExtensions.cs100%100%100%100%
src/Sinks/Json
   JsonSink.cs74.66%42.86%100%77.68%102–105, 149–151, 151, 151, 151, 151, 151, 154–156, 158–159, 161–162, 164, 166, 177, 177–180, 82, 82, 82, 82, 84–86, 96–99
   MultilineJsonSink.cs79.12%52%100%81.69%113, 113, 113, 113, 115–117, 127–130, 133–136, 206–208, 208, 208, 208, 208, 208, 211–213, 215–216, 218–219, 221, 223, 234, 234–238
src/Sinks/Parquet
   ParquetSink.cs83.26%64.10%100%85.88%114, 114, 114, 114, 116–118, 134–137, 161, 212–215, 252–254, 254, 254, 254, 254, 254, 257–259, 261–262, 264–265, 267, 269, 280, 280, 286–288, 45
   ParquetOperations.cs74.68%66.67%87.50%77.45%100–102, 102, 102, 102, 102, 105–107, 159–161, 61–62, 62, 62–65, 65, 65–67, 69, 71–72, 89, 89, 89, 92–96, 96, 96, 98–99
src/Sinks/Parquet/Models
   DataCell.cs100%100%100%100%
src/Sources
   SchemaBoundSource.cs50%100%33.33%60%25, 28, 32, 36, 39, 49
   SchemaFreeSource.cs88.24%100%83.33%90.91%28
src/Sources/CdmChangeFeedSource
   CdmChangeFeedSource.cs0%0%0%0%114–115, 115, 115, 115, 115–118, 121, 121, 121, 121, 121–129, 133–135, 138–141, 143–144, 155, 159–163, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165–175, 178–179, 182, 185, 188–190, 190, 190–193, 195–198, 201–202, 202, 202, 206, 206, 206–209, 211–212, 214–215, 217, 220–223, 223, 223–224, 228–231, 233–236, 236, 236–238, 240, 240, 240–243, 247, 247, 247, 247, 247–252, 252, 252–257, 260–268, 270, 270, 270, 272–273, 275–276, 278–280, 283, 285–298, 301–348, 351–352, 352, 352, 352, 352–355, 357, 357, 357–359, 359, 359–363, 365–367, 369–371, 374–379, 379, 379, 379, 379–386, 386, 386–389, 391–395, 395, 395, 395, 395–400, 402–403, 406, 408–416, 46–64, 67, 72, 75, 79–81, 85–92
src/Sources/CdmChangeFeedSource/Extensions
   CsvOperations.cs90.22%97.22%33.33%88.68%54, 85–87, 95–97
   JsonDocumentOperations.cs0%100%0%0%18–24, 35–40, 49–53
   SimpleCdmAttributeExtensions.cs0%0%0%0%19–23, 23, 23, 23, 23–30, 30, 30, 30, 30–41, 41, 41–46, 49–53, 53, 53–55, 58–59
src/Sources/CdmChangeFeedSource/Models
   SimpleCdmAttribute.cs0%0%0%0%106–108, 108, 108–111, 111, 111–113, 116–118, 123–124, 124, 124–126, 129, 129, 129–131, 134, 134, 134–136, 139, 139, 139, 14, 140–141, 144, 144, 144, 144, 144, 144, 144, 144, 144, 144, 144–149, 15, 150, 153–156, 16–29, 35, 41, 47, 53, 59, 65, 70–71, 79–81, 90–91, 91, 91–93, 96–97
   SimpleCdmEntity.cs0%0%0%0%101, 101, 101–103, 106, 106, 106, 106, 106–109, 112–114, 18, 23, 28, 33–34, 42–56, 58, 60–61, 69–70, 72, 72, 72–75, 77, 79–80, 85–86, 86, 86–88, 91, 91, 91–93, 96, 96, 96–98
src/Sources/Extensions
   GraphStageLogicExtensions.cs100%100%100%100%
   SinkExtensions.cs100%100%100%100%
   SourceExtensions.cs8.33%0%50%5%44–45, 45, 45–62
   SqlServerUtils.cs100%100%100%100%
   SchemaFreeSourceExtensions.cs100%100%100%100%
59.53%

Coverage Report
FileStmtsBranchesFuncsLinesUncovered Lines
src/Metrics/Models
   SourceTags.cs0%100%0%0%13, 18, 23, 30–37
src/Services
   StreamRunnerService.cs0%0%0%0%30–40, 44–49, 53–54, 54, 54–56, 59–61
src/Services/Base
   IStreamConfigurationProvider.cs0%100%0%0%11, 16
src/Sinks
   SchemaBoundSink.cs100%100%100%100%
   SchemaFreeSink.cs100%100%100%100%
src/Sinks/Extensions
   SchemaFreeSourceExtensions.cs100%100%100%100%
src/Sinks/Json
   JsonSink.cs74.66%42.86%100%77.68%102–105, 149–151, 151, 151, 151, 151, 151, 154–156, 158–159, 161–162, 164, 166, 177, 177–180, 82, 82, 82, 82, 84–86, 96–99
   MultilineJsonSink.cs79.12%52%100%81.69%113, 113, 113, 113, 115–117, 127–130, 133–136, 206–208, 208, 208, 208, 208, 208, 211–213, 215–216, 218–219, 221, 223, 234, 234–238
src/Sinks/Parquet
   ParquetSink.cs83.26%64.10%100%85.88%114, 114, 114, 114, 116–118, 134–137, 161, 212–215, 252–254, 254, 254, 254, 254, 254, 257–259, 261–262, 264–265, 267, 269, 280, 280, 286–288, 45
   ParquetOperations.cs74.68%66.67%87.50%77.45%100–102, 102, 102, 102, 102, 105–107, 159–161, 61–62, 62, 62–65, 65, 65–67, 69, 71–72, 89, 89, 89, 92–96, 96, 96, 98–99
src/Sinks/Parquet/Models
   DataCell.cs100%100%100%100%
src/Sources
   SchemaBoundSource.cs50%100%33.33%60%25, 28, 32, 36, 39, 49
   SchemaFreeSource.cs88.24%100%83.33%90.91%28
src/Sources/CdmChangeFeedSource
   CdmChangeFeedSource.cs0%0%0%0%114–115, 115, 115, 115, 115–118, 121, 121, 121, 121, 121–129, 133–135, 138–141, 143–144, 155, 159–163, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165, 165–175, 178–179, 182, 185, 188–190, 190, 190–193, 195–198, 201–202, 202, 202, 206, 206, 206–209, 211–212, 214–215, 217, 220–223, 223, 223–224, 228–231, 233–236, 236, 236–238, 240, 240, 240–243, 247, 247, 247, 247, 247–252, 252, 252–257, 260–268, 270, 270, 270, 272–273, 275–276, 278–280, 283, 285–298, 301–348, 351–352, 352, 352, 352, 352–355, 357, 357, 357–359, 359, 359–363, 365–367, 369–371, 374–379, 379, 379, 379, 379–386, 386, 386–389, 391–395, 395, 395, 395, 395–400, 402–403, 406, 408–416, 46–64, 67, 72, 75, 79–81, 85–92
src/Sources/CdmChangeFeedSource/Extensions
   CsvOperations.cs90.22%97.22%33.33%88.68%54, 85–87, 95–97
   JsonDocumentOperations.cs0%100%0%0%18–24, 35–40, 49–53
   SimpleCdmAttributeExtensions.cs0%0%0%0%19–23, 23, 23, 23, 23–30, 30, 30, 30, 30–41, 41, 41–46, 49–53, 53, 53–55, 58–59
src/Sources/CdmChangeFeedSource/Models
   SimpleCdmAttribute.cs0%0%0%0%106–108, 108, 108–111, 111, 111–113, 116–118, 123–124, 124, 124–126, 129, 129, 129–131, 134, 134, 134–136, 139, 139, 139, 14, 140–141, 144, 144, 144, 144, 144, 144, 144, 144, 144, 144, 144–149, 15, 150, 153–156, 16–29, 35, 41, 47, 53, 59, 65, 70–71, 79–81, 90–91, 91, 91–93, 96–97
   SimpleCdmEntity.cs0%0%0%0%101, 101, 101–103, 106, 106, 106, 106, 106–109, 112–114, 18, 23, 28, 33–34, 42–56, 58, 60–61, 69–70, 72, 72, 72–75, 77, 79–80, 85–86, 86, 86–88, 91, 91, 91–93, 96, 96, 96–98
src/Sources/Extensions
   GraphStageLogicExtensions.cs100%100%100%100%
   SinkExtensions.cs100%100%100%100%
   SourceExtensions.cs8.33%0%50%5%44–45, 45, 45–62
   SqlServerUtils.cs100%100%100%100%
   SchemaFreeSourceExtensions.cs100%100%100%100%