flow-php / flow

Flow PHP - data processing framework
https://flow-php.com
MIT License
404 stars 23 forks source link

Simple XML example not working on Windows 10 / PHP 8.2.10 #1011

Closed faridjc closed 3 months ago

faridjc commented 3 months ago

Hello! I'm new to Flow-PHP, and trying to implement a the XML example and I get this error:

Fatal error: Uncaught Exception: Unknown scheme "C"

My code:

declare(strict_types=1);

use function Flow\ETL\Adapter\XML\from_xml;
use function Flow\ETL\DSL\{data_frame, ref, to_output};

require __DIR__ . 'path/to/autoload.php';

data_frame()
    ->read(from_xml(
        __DIR__ . '/Data.xml',
        ''
    ))
    ->write(to_output(false))
    ->run();

I'd appreciate any help.

norberttech commented 3 months ago

Hey, could you please provide also sample of this xml file? I need to check if its windows or file related problem.

faridjc commented 3 months ago

Sure,

<?xml version="1.0" encoding="UTF-8"?>
<Unit>
    <DealerID/>
    <InventoryId>2222</InventoryId>
    <StockNumber>3EERT44</StockNumber>
    <Location>Trailer</Location>
    <Status>On Order</Status>
    <VehicleType>Air Ride Trailer</VehicleType>
    <NUD>N</NUD>
    <Year>2024</Year>
    <Make>A4</Make>
    <Model>GJ890</Model>
    <ModelNo>467890</ModelNo>
    <VIN/>
    <SecSerial/>
    <Odometer/>
    <ExtColor>Black</ExtColor>
    <IntColor/>
    <BodyStyle>Open</BodyStyle>
    <FuelType/>
    <PurchaseDate>1900-01-01T00:00:00</PurchaseDate>
    <DOL>0</DOL>
    <ExpectedDate>1900-01-01T00:00:00</ExpectedDate>
    <SalesDesc>Description</SalesDesc>
    <AddOnCost>0.00</AddOnCost>
    <PurchaseCost>0.00</PurchaseCost>
    <TotalCost>0.00</TotalCost>
    <Price>0.00</Price>
    <ClearancePrice>0.00</ClearancePrice>
    <WebPrice>0.00</WebPrice>
    <RedTagPrice>0.00</RedTagPrice>
    <Length>48'</Length>
    <Width>99"</Width>
    <Height/>
    <Weight/>
    <GVW>0</GVW>
    <Slides/>
    <Sleeps/>
    <Transmission/>
    <Driveline/>
    <Doors></Doors>
    <Seats/>
    <Driveline/>
    <Other/>
    <MaxLoad/>
    <Hitch/>
    <Dist/>
    <Engine/>
    <HorsePower/>
    <DriveNumber/>
    <TransNumber/>
    <MotorType> </MotorType>
    <MotorBuild/>
    <Brakes/>
    <Axles/>
    <Condition>New</Condition>
    <LastChanged>2023-06-02T10:42:18.333</LastChanged>
</Unit>
norberttech commented 3 months ago

hmm, I was able to read that file with no issues even that Unit should most likely be under another node like:

<Units>
   <Unit>...</Unit>
   <Unit>...</Unit>
</Units>

Could you please try to execute the same code on some unix machine? Is there any trace of this error? I have never seen something like that, and I don't remember any Flow error that could give you a similar message.

faridjc commented 3 months ago

Thanks for your response. I don't have a Unix machine where to execute the code.

This is the trace:

Fatal error: Uncaught Exception: Unknown scheme "C" in C:\Users...\vendor\flow-php\etl\src\Flow\ETL\Filesystem\Path.php on line 44

Call stack:

Flow\E\F\Path::__construct() .../vendor/flow-php/etl/src/Flow/ETL/Filesystem/Path.php:90 Flow\E\F\Path::realpath() .../vendor/flow-php/etl-adapter-xml/src/Flow/ETL/Adapter/XML/functions.php:38 Flow\E\A\X\from_xml()

I did a little digging, the exception is thrown by this block:

if (\array_key_exists('scheme', $urlParts) && !\in_array($urlParts['scheme'], \stream_get_wrappers(), true)) {
     throw new InvalidArgumentException("Unknown scheme \"{$urlParts['scheme']}\"");
}

One of the values returned by stream_get_wrappers is "file", which is what $urlParts['scheme'] should be in this case. For that to happen, the URI should be "file://C:\Users....", but it's just "C:\Users...". So when parse_url is called, "C" becomes the scheme (instead of "file").

Passing "file://" to the function from_xml does not solve it.

norberttech commented 3 months ago

Thanks, that's actually super helpful, I think I know exactly what's wrong here. Give me some time to play around with this, I might be able to fix it

faridjc commented 3 months ago

Thank you!

norberttech commented 3 months ago

I just tried something like this, and it seems to be working fine, could you try to prepare path like that? (sorry I'm not very familiar with windows, I'm trying to setup a windows dev environment but it seems to be a bit more complicated than I expected)

    public function test_windows_paths() : void
    {
        self::assertEquals(
            'file://C:/path/to/file.csv',
            (new Path('file://C:/path/to/file.csv'))->uri()
        );
    }
faridjc commented 3 months ago

I tried the following:

data_frame()
    ->read(from_xml(
        (new Path('file://C:/Users/.../Data.xml'))->uri(),
        ''
    ))
    ->write(to_output(false))
    ->run();

And got the following:

Fatal error: Uncaught Error: Call to undefined method Flow\ETL\Filesystem\FilesystemStreams::scan() in C:\Users...\vendor\flow-php\etl-adapter-xml\src\Flow\ETL\Adapter\XML\XMLReaderExtractor.php on line 43

Call stack: Flow\E\A\X\XMLReaderExtractor::extract() Generator::valid() .../vendor/flow-php/etl/src/Flow/ETL/Pipeline/SynchronousPipeline.php:65 Flow\E\P\SynchronousPipeline::process() .../vendor/flow-php/etl/src/Flow/ETL/DataFrame.php:770 Flow\ETL\DataFrame::run()

Which is the same error I got when I tried manually adding "file://" to the URI just before its scheme was checked against stream_get_wrappers.

scan, is a method from LocalFilesystem, not FilesystemStreams, which is getting picked up from this $context->streams(), and that's as far as I got trying to understand the issue.

norberttech commented 3 months ago

Fatal error: Uncaught Error: Call to undefined method Flow\ETL\Filesystem\FilesystemStreams::scan()

I don't like this part, could you show me your composer.json (only flow-php related parts)? Your dependencies might be out of sync

norberttech commented 3 months ago

I think I managed to get it work, here is the code that I used on Windows (took me a while to set it up 😂) to read that xml file (I simply skipped the C:/ part in the file path) :

<?php

use Flow\ETL\Filesystem\Path;
use function Flow\ETL\Adapter\XML\from_xml;
use function Flow\ETL\DSL\df;
use function Flow\ETL\DSL\to_output;

require __DIR__ . '/vendor/autoload.php';

df()
    ->read(from_xml(new Path("file://Users/norbert/Workspace/flow-php/flow/file.xml")))
    ->write(to_output(true))
    ->run();
norberttech commented 3 months ago

hey @faridjc, any luck with suggested approach?

faridjc commented 3 months ago

Hello @norberttech! No luck trying your example there. I keep getting this error:

Fatal error: Uncaught Error: Call to undefined method Flow\ETL\Filesystem\FilesystemStreams::scan()

Now it has to be something in my environment.

This is my composer.json (relevant parts)

{
    "require": {
        "flow-php/etl-adapter-xml": "^0.7.0",
        "flow-php/etl-adapter-json": "^0.7.0",
        "flow-php/etl-adapter-http": "^0.7.0"
    }
}
norberttech commented 3 months ago

could you please also explicitly add "flow-php/etl": "^0.7.0"? It seems that you dont have the latest version of ETL core which is weird but lets try

faridjc commented 3 months ago

I finally got it to read my XML with this composer.json config:

"flow-php/etl":"1.x-dev",
"flow-php/etl-adapter-xml": "^0.7.0"

etl-adapter-xml requires flow-php/etl ^0.6.0 || 1.x-dev, so it wouldn't let me use ^0.7.0.

Thank you so much for all your help!

norberttech commented 3 months ago

etl-adapter-xml requires flow-php/etl ^0.6.0 || 1.x-dev, so it wouldn't let me use ^0.7.0.

🤦‍♂️ again I forgot to bump dependencies, sorry for that. I'm gonna look for a more permanent and automated solution for this problem.

Thank you so much for all your help!

You are more than welcome, good luck with your project!