dmanty45 / bots

Automatically exported from code.google.com/p/bots
0 stars 0 forks source link

add: 'raw' editype #92

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Raw editype:
nothing is done: 
it is just read and passed to translation (incoming)
or written to file (outgoing).

Use case:
- making eg pdf file in translation, just write it.
- passing text file to translation, parse in mapping.

Original issue reported on code.google.com by hjebb...@gmail.com on 20 Oct 2011 at 11:02

GoogleCodeExporter commented 8 years ago
Tested this in production environment, works well.

Original comment by mjg1964 on 21 Oct 2011 at 8:13

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
Hi henk-jan,
mapping raw input requires creating bots/usersys/mapping/raw folder with 
__init__.py
This should probably be included in the upgrade plugin.

I have been testing this with PDF preprocess, works ok.

Kind Regards,
Mike

Original comment by mjg1964 on 12 Dec 2011 at 10:36

GoogleCodeExporter commented 8 years ago
hi Mike,
you are right, that should be in.
can you take a look at the character set, I was wondering if this goes OK.

henk-jan

Original comment by hjebb...@gmail.com on 12 Dec 2011 at 11:18

GoogleCodeExporter commented 8 years ago
I am using iso8859-1 for input and output, receiving a PDF and converting to 
text with just inn2out in mapping script for now. This works ok. As mentioned 
on the pdf preprocess issue, I am now trying pdfminer for better results.

Original comment by mjg1964 on 13 Dec 2011 at 3:54

GoogleCodeExporter commented 8 years ago
hi mike,

another option would be to use the character set from the channel.
iso8859-1 is OK most of the time. But eg Russia, Azia etc do use other sets. 
I work with a lot of character sets.
This would also be more in line with the rest of editypes.

henk-jan

Original comment by hjebb...@gmail.com on 13 Dec 2011 at 11:47

GoogleCodeExporter commented 8 years ago
Hi henk-jan,
I created this "raw to raw" mapping script as an example. It gets the charset 
from input and output channels and uses this to decode/encode respectively. I 
don't have any files to test that require a special charset so have only tested 
this with ascii files. This example just reads the input records and writes 
output with no changes (apart from charset).

Actually I envisage that in a route only one side would be "raw", the other 
side would be some known format that bots has a grammar for. 
Raw input could be used eg. raw to edifact mapping for input from a free format 
text file. 
Raw output could be used eg. for edifact to pdf document using reportlab.

Original comment by mjg1964 on 26 Dec 2011 at 4:14

Attachments:

GoogleCodeExporter commented 8 years ago

Original comment by hjebb...@gmail.com on 22 Jun 2012 at 3:29

GoogleCodeExporter commented 8 years ago

Original comment by hjebb...@gmail.com on 10 Sep 2013 at 12:44