Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
8.29k stars 674 forks source link

bug/eml file partitioning fails #1546

Closed andreskull closed 9 months ago

andreskull commented 11 months ago

Describe the bug The partitioning returns an empty elements list of an eml-file

To Reproduce

from unstructured.partition.auto import partition

elements = partition(filename="tehnopol.eml")
print("\n\n".join([str(el) for el in elements]))

The eml file is here:

Delivered-To: andres@gmail.com
Received: by 2002:a05:6a11:3edf:b0:4ec:ff50:6077 with SMTP id bv31csp28322pxc;
        Fri, 15 Sep 2023 00:24:00 -0700 (PDT)
X-Google-Smtp-Source: AGHT+IGmDXTWgDlaq3bugyvjBUUv/9y6WltwmS59eTiBRO7CRJ2/wCjvc+jYF2C17NkdQLoP+5Sj
X-Received: by 2002:a2e:97d0:0:b0:2bc:b88c:64ed with SMTP id m16-20020a2e97d0000000b002bcb88c64edmr845397ljj.12.1694762639764;
        Fri, 15 Sep 2023 00:23:59 -0700 (PDT)
ARC-Seal: i=2; a=rsa-sha256; t=1694762639; cv=pass;
        d=google.com; s=arc-20160816;
        b=gbwD1bzXia8HznA1d9aFUZKkKDFDzcJjrdeOYsWAotmTJV74BKd7cHIk9A+DCTFupW
         V0oLhqgtojdKHWPKFIdHgsc3tWzxYvTrGQqnamEPmwNat8pqLMKX+cImLxV4dbG+JbOr
         pLoqDWNJk/ybWilMK+rkEQBDJt9s/MvNs+B3EVVgyenoRgcYGIRXuLe1N6kqprAhXkV3
         4Vct+p0+3Eq7r5tif0b7vqbb+FyCIBP8UlL7xN2ALaYBnJnQJdFv7KrgZCqX6DPKnBpr
         ApBwVzRift0vDyqu7C9LXGxxh+wq+GCYqqdCSccVsEYjFzWT17WsXce5eoMd/hl2OAaa
         81pg==
ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816;
        h=mime-version:content-transfer-encoding:content-id:content-language
         :accept-language:message-id:date:thread-index:thread-topic:subject
         :cc:to:from:dkim-signature;
        bh=kRMPUCPDX5lYv8tT/JZEs3JUgP/Tu7Gg/XYh0qIBL88=;
        fh=J7vuPcG03kIA4kN/KVx2Z8Gp1t9JXH2My8kO+ha4n4k=;
        b=wRQQJuVNKPeufASYYWJ07dAnfZUCvz8foz4yFlMERHuovQNmcRSjvO6hDtdX883kxb
         uAPZHo8ki8BXK8gxOuZThKa3bl6JpKXuuiCD3EZckRe0yMU2ScUdDG2HlAPkvNW7EEJa
         p3bUkQuYY3VF2q/gpkTX31ZY2FDLzfuV+jaV9c+IbhFQjZkdsByAS/G9c3tpeqzOCvQm
         o53OC7DDQu897j/ntwDqsz2OnAsxtJMR3P4BrPgLSCYIya7b2/xCkwGISiIz98TGMuSZ
         JG/Wh3T5xHSXeKrgOXy3WfT5alxdOk2+rP2yQr6+SW3MHrkxPslx22iUyq4zA0c5VWpe
         wZjg==
ARC-Authentication-Results: i=2; mx.google.com;
       dkim=pass header.i=@tehnopol.onmicrosoft.com header.s=selector2-tehnopol-onmicrosoft-com header.b=S41S3zZU;
       arc=pass (i=1 spf=pass spfdomain=tehnopol.ee dkim=pass dkdomain=tehnopol.ee dmarc=pass fromdomain=tehnopol.ee);
       spf=pass (google.com: domain of otto@tehnop.ee designates 2a01:111:f400:fe0e::62b as permitted sender) smtp.mailfrom=otto@tehnop.ee
Return-Path: <otto@tehnop.ee>
Received: from EUR04-VI1-obe.outbound.protection.outlook.com (mail-vi1eur04on062b.outbound.protection.outlook.com. [2a01:111:f400:fe0e::62b])
        by mx.google.com with ESMTPS id ce23-20020a170906b25700b009a578c68686si2871808ejb.986.2023.09.15.00.23.58
        (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
        Fri, 15 Sep 2023 00:23:59 -0700 (PDT)
Received-SPF: pass (google.com: domain of otto@tehnop.ee designates 2a01:111:f400:fe0e::62b as permitted sender) client-ip=2a01:111:f400:fe0e::62b;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@tehnopol.onmicrosoft.com header.s=selector2-tehnopol-onmicrosoft-com header.b=S41S3zZU;
       arc=pass (i=1 spf=pass spfdomain=tehnopol.ee dkim=pass dkdomain=tehnopol.ee dmarc=pass fromdomain=tehnopol.ee);
       spf=pass (google.com: domain of otto@tehnop.ee designates 2a01:111:f400:fe0e::62b as permitted sender) smtp.mailfrom=otto@tehnop.ee
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;
 b=cit3putl6GGiq5wE5eYNCGH+49w+XyJnJGkZ3CraglK4UvBsu7COFXyv2XPZVotUYj2b8FVgeby8TfDgYEsNj3Hf4jDLbOrgF4Y8T6mSa06wqtgydbDgBxKF0V/L9DKwhOJ1XrCZxXE41HY/VaOMB/N2dwoDx/g2zpAnsA5SkqZ1EM1fm3o6WIm0ILANCSw3NLujZ+S2Rqo4SUr2XENI/VCbNCO70R3MpZe2Tf3O13feJDk+/LmEb1Hw1Qdk73PuYYFz4kuu4gPGpVCkAsUaQ5mlUuXUoLbH6Xq7AbGNshTqBl/mYFuBcz4QALWiuKHbD9FKbY7nS7QoHWGBrcL45A==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector9901;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=kRMPUCPDX5lYv8tT/JZEs3JUgP/Tu7Gg/XYh0qIBL88=;
 b=nnZBsh6xZUC/Lk7X7Buix+y83rOglcIZG7HuCE6ay0u5vkDmCMe7hLMriSeu4E1XM99dDqNHQx4rv+uEOnB6UTOQgRq4vPqQcy3MTtGaU/V73cMfqNtxjFmg4zD8Nk5QgC/LIy5GUgvLqodwAdxduHV9FHlHLi6Nf4f7Qnjnvmx0bviQO1VVuiLatYDbl9fWHjV64dAvnRntK4Q/jquq9Fa1UwOQmGLj4J0sXI22O+Ni5F7e6zayMnWhddmIVLEvHFCyJRcxmgLCpVea2BVKkZ83+KBam9dB7wUnXR4bvYKXnqQeKGgGnPebcdQhd41lBQPGtQDbk6W5PxDUgh3TNw==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=tehnopol.ee; dmarc=pass action=none header.from=tehnopol.ee;
 dkim=pass header.d=tehnopol.ee; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=tehnopol.onmicrosoft.com; s=selector2-tehnopol-onmicrosoft-com;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=kRMPUCPDX5lYv8tT/JZEs3JUgP/Tu7Gg/XYh0qIBL88=;
 b=S41S3zZUxUJ8w9Z8P7DWcSfSVQQ6PcBiZGWGS8CbdgquLz7fmu8fpJkGHbTHT4LyqX95SrVdo/IcezgxLaA0dwXOPyXfDNiyhgMZB4K+G5F2tZOIkIhvoavgnpSWX5vpgwx9BSw7kPn+0thCoxzq1vIEj11NcD3Zj+F9urT/HbY=
Received: from AM6PR07MB4485.eurprd07.prod.outlook.com (2603:10a6:20b:26::17)
 by AS4PR07MB8684.eurprd07.prod.outlook.com (2603:10a6:20b:4f2::8) with
 Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6768.31; Fri, 15 Sep
 2023 07:23:55 +0000
Received: from AM6PR07MB4485.eurprd07.prod.outlook.com
 ([fe80::19ba:9168:3b7a:48f0]) by AM6PR07MB4485.eurprd07.prod.outlook.com
 ([fe80::19ba:9168:3b7a:48f0%4]) with mapi id 15.20.6792.020; Fri, 15 Sep 2023
 07:23:55 +0000
From: =?utf-8?B?T3R0byBNw6R0dGFz?= <otto@tehnop.ee>
To: Ahto <ahto@gmail.com>
Subject:
 =?utf-8?B?VEVITk9QT0wgfMKgVmFhZGFrZSBvdHNlw7xsZWthbm5ldDogVGVobm9wb2xp?=
 =?utf-8?B?IEFJIHTDtsO2dG9hIGV0dGVrYW5kZWQ=?=
Thread-Topic:
 =?utf-8?B?VEVITk9QT0wgfMKgVmFhZGFrZSBvdHNlw7xsZWthbm5ldDogVGVobm9wb2xp?=
 =?utf-8?B?IEFJIHTDtsO2dG9hIGV0dGVrYW5kZWQ=?=
Thread-Index: AQHZ56WV9Jwlp0InkUGy7D241/HKug==
Date: Fri, 15 Sep 2023 07:23:55 +0000
Message-ID: <53130FE8-1BAE-4149-B003-7A4DC49C877A@tehnopol.ee>
Accept-Language: en-GB, en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
authentication-results: dkim=none (message not signed)
 header.d=none;dmarc=none action=none header.from=tehnopol.ee;
x-ms-publictraffictype: Email
x-ms-traffictypediagnostic: AM6PR07MB4485:EE_|AS4PR07MB8684:EE_
x-ms-office365-filtering-correlation-id: f8cdbb78-d212-4acc-0913-08dbb5bcb836
x-ld-processed: a383bcc5-558f-4a11-8500-9ca39acb76cd,ExtAddr
x-ms-exchange-senderadcheck: 1
x-ms-exchange-antispam-relay: 0
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info:
 K2rounKf3kPKNbnHNe2HuXYjGTFBo9uVOxIy4jMltezahEsLueUS0eH/radBswTY9gj6F5foluaJHKCpiCh9QuiEVWck9Gqn3mJQNRyX+TFfQFRDksSit5JsQeCVz5fIecvUVlsuLJdogEbYn/vz8u2SKbL5Qfo2FMmRleMZgBqSzS7wTIaMvBvj6K8SuZWw2dWZZgB/fkRFW7nxlsatRh9R18IKG34ErC3SQrbJVFfq8dp1SXGVvkAlpl3Xwy4H7q6ib/ars9P6fK2IFABhM0PVdgCuUzPFu39n1K6ElQPVxTZXrO+WeESShxrq4OrD4UhvQlp3mqCr9tzPDmxsqdTT5hDiAR6S0N1sFGTnUdsgnsa02mlxQKM4XZRh89P67DP5I2HeSbNt1IYJDScAAsI+O3918EfHUS6miCwp3mfWfxqKTR3Pl1ABEPf1RqNsJqStGM2EWxa9jj/2iC3BwKCDCESks8+gViBH8X795FM5MPrMM89hRjeecduhXTCl7UKSWs83qp3nJv7FZqPD5+NRzE9zOlPgRNTMfBeGPRMgMnkJhXtzZmdebNC1CfWIZCAyma2dauiKJRcVL3RAqKEkfm8MDuUTvLC348giyiyl8dYWAabiOXUC9d3ytpMpOtvtGoIKjMyk0glvR1E9RTdVrN13YFlEa/1fS8BTk4JVTZONrFwm7ywOEkdivgmXe9kblD6XZzJpA+pC3tD4Zw==
x-forefront-antispam-report:
 CIP:255.255.255.255;CTRY:;LANG:et;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:AM6PR07MB4485.eurprd07.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(366004)(396003)(136003)(39840400004)(376002)(346002)(451199024)(186009)(1800799009)(41300700001)(122000001)(6486002)(6506007)(71200400001)(76116006)(921005)(110136005)(224303003)(85202003)(36756003)(86362001)(38100700002)(38070700005)(33656002)(85182001)(7276002)(4326008)(478600001)(7406005)(2616005)(2906002)(54906003)(64756008)(1191002)(6512007)(83380400001)(91956017)(7416002)(8936002)(66476007)(7366002)(5660300002)(966005)(7336002)(26005)(66446008)(316002)(66946007)(107886003)(66556008)(45980500001);DIR:OUT;SFP:1101;
x-ms-exchange-antispam-messagedata-chunkcount: 1
x-ms-exchange-antispam-messagedata-0:
 =?utf-8?B?RncxZ3hxbXowVi9UcmkvSVVjblM3eDJPTjl3dTlacjB5MWxhR3lQUjBlcm8y?=
 =?utf-8?B?Yi9HcGM5RloyYlRXSGZsdFdpWUkvWTFFM3pSZ2hWV1hIYUlySlg0cVk0OG1a?=
 =?utf-8?B?ZVlISGVtZnRGN2dzbk9aSDFuZmhHRXVSbXNNSmd1elFJdlczTkl6dTIvWi8y?=
 =?utf-8?B?MG9FRDV3N2paWitSQStOTkMyL2dKSVArbjBxYWJXMy9sakt4UEI2TnpQYlQ3?=
 =?utf-8?B?SGs3WU9KL28xaTJsYVEvc0pZTG9MV2FWZnAxUUVrelkxdE1wWUJQa2lQZTdP?=
 =?utf-8?B?eUlwRnFLMG9LSUx0WE41Ny8zRm5oTmo2bStQL1VOZUMwNit3SGUvaWprb0JD?=
 =?utf-8?B?MTRZMVNndGhOblNWVmpLMHJGbm8ycWVkV2dXb0hGQkx2bjE5eDNxVDEwbU81?=
 =?utf-8?B?UFNaYU4xWGU4Ni9vZjRlbG1aTmFyK0VXcUJUVUFsaFA4bzRXWnVaV2tiZ0w4?=
 =?utf-8?B?S1hjWUdEakxRV0N3QzFod1JLRHV1NEtWc2xKQjVZRTBpTXlueFo1UVBqRDl0?=
 =?utf-8?B?ODFpNGpIWHkyUmdxNGRTdDhDd3lHTmdjQUkwUGp3MHROdTVJelNYVVU1R3N1?=
 =?utf-8?B?OVppWUNGcFdabFBzaGt4TE5raGtVMHllT0pFelRiend3VUpKVllaekVBMVBx?=
 =?utf-8?B?NDVqUFU2SHVJQklqQ0J6bUxHNytBeFR0L2FEUFNEdkRya2xkWUI4UnZqcjJR?=
 =?utf-8?B?UlM5MVkvVTZISkFVL1FSVlZsQWlPZHE5SWhqcUwwT1JsVWg5elNQcFdoSisr?=
 =?utf-8?B?TnJjVlNBUzRqSklTaFVNWlVnSlByM2swVFUwT1RlcVRPbk54bXY0eGtxS1d2?=
 =?utf-8?B?dEhrRjNYS3lYVFBGVW4vQnpScnZqWTZTb0dsSGc5Zm0zOHFEQTZzRzkxQ0pJ?=
 =?utf-8?B?MEZSSEtSblYyc21EcEZyMXFkSUgzV0ZyZGJvM1d2UzJtWFhtQitGcVlQZmZM?=
 =?utf-8?B?V1AwV3VVWVZZYWlSdUpuR1I0S050cnByN3JmOE5EOS9WUGozQUc0eCs4d1pv?=
 =?utf-8?B?cURsYzZscmU5SFRIQllXc1hWR0NzUjdHdlJJdW9lS0VRVjdIeWpVc2tzMXFD?=
 =?utf-8?B?ZTcvVVRGcUhZVXJvTjNBNi9tL0JmZHNOcEJSNEVsZm11MkhjemtPVUJEWkt0?=
 =?utf-8?B?akJBbUNrc2E1dXIxTlUwRmFVYXI0dHc5VFFGRUszWXRhbUxteFFxVnp0MGdN?=
 =?utf-8?B?STJGM3VCUitOc2I1WlBrMVhnOHdRMFRNV3c1M1JsNWZMKzRNdjhNbVVXc1ZG?=
 =?utf-8?B?VU5FRDRHNGlTQnBUajR2NkZMWTlnM3g3TEQ5Z0NZYzFjaWVCOVBhcUVldSs3?=
 =?utf-8?B?a1AzYXhTUjlMdmJxVkJKTzJ5bEQxSC9JWUtNWC8wUm1RdmdZajZZZWxxcm9t?=
 =?utf-8?B?WlMvN0d6eGNPVGRxNHI5UmlGN2ZGaG9WQ0VXYnZCNTI2WUtyL0Fkai9lcDkw?=
 =?utf-8?B?b3A0VUd4UGFPYmVBVnliYVNuVU11T2d5a29vKy9JaWMycEtFaDRPNW11UXpI?=
 =?utf-8?B?bnIwUS9menpZS1lDOVJJbExpeGFvR1NYMlhHc0ZMZVJvenloYWsxQ1lVR3hw?=
 =?utf-8?B?NjEvMWlSOW14NWJrSVdSNUt1U0Z0MHU0Vkw4Z3d2ZE1Vbm9HajNJOHUybWRm?=
 =?utf-8?B?djlqMUxMWVZVWnl3NVg2SHZPRWUxVlpvN3RmRnQ5UDhjZDJMVUpSMzVXU0Zj?=
 =?utf-8?B?M1JxSkhpK3RKTHMxZE1tNklJcklCTW9vVnAwMVNUeExMQ2V5aDEyWGpOUU5j?=
 =?utf-8?B?dmpqY3k5Y3IzR2dtMUFnemlWVkhDc1VvTU0yTjFhOVFVblpwaURRV2RHV2Ix?=
 =?utf-8?B?Y2NTaDZzTHpJbVhQSG0ydUZwR1pQRmhSelhhZDQ2alFBaE55aVpVNDFJVmZ3?=
 =?utf-8?B?anpUU0FwRkdWTTFZYTlKRlJCWEVRQnVHV2VqbFJCdGtUR1JOZ2xFME1lWHRK?=
 =?utf-8?B?b0FaOXd2eWZmdEpvSC84Y2MwSGtaQkJZQVZDSWJmc28wMmkxeWM1SllPWEtl?=
 =?utf-8?B?Q3ErWWVvTC9IVHJ1UDFsejlDS2tVcnhoUnB2bjMzM3pCTDV5Z0pRS2pzRTFa?=
 =?utf-8?B?U1RsMXlkcVVPcnZTMjdVZWJKWXFuNEZnWlZHSjlVb3ZvLzlobmt4NGdDWXc0?=
 =?utf-8?B?S1ZmVy9JeFl5ZkxWZFYzOStjby9wWWVsVU9hWStJV1ZCWVBSWndieEI4RkVs?=
 =?utf-8?B?Wmc9PQ==?=
Content-Type: text/plain; charset="utf-8"
Content-ID: <EECD45DFDB37804C8431AD6F4F9E7DA5@eurprd07.prod.outlook.com>
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-OriginatorOrg: tehnopol.ee
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: AM6PR07MB4485.eurprd07.prod.outlook.com
X-MS-Exchange-CrossTenant-Network-Message-Id: f8cdbb78-d212-4acc-0913-08dbb5bcb836
X-MS-Exchange-CrossTenant-originalarrivaltime: 15 Sep 2023 07:23:55.1817
 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: a383bcc5-558f-4a11-8500-9ca39acb76cd
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: TTxWJVObpIm1CUhEn4XDmQq+8+q+bBBM8Jcggo3pD0l4xFLkj6nr3q9aSciRw2DGotX/cUX5j+cRtAPpR7rW1A==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS4PR07MB8684

LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tDQpGb3IgRW5nbGlzaCwgc2VlIGJlbG93DQot
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0NCg0KVGVyZSwgQUkgZW50dXNpYXN0aWQNCg0K
w4RyZ2UgasOkdGtlIGthc3V0YW1hdGEgdsO1aW1hbHVzdCB2YWFkYXRhIG1laWUgdMOkbmFzZSBB
SSB0w7bDtnRvYSBvdHNlw7xsZWthbm5ldCBQb3N0aW1laGUgVGVhZHVzcG9ydGFhbGkgdmFoZW5k
dXNlbCEgTGlpdHVnZSwgZXQgc2FhZGEgYWltaSwgbWlsbGVnYSBpbm5vdmFhdGlsaXNlZCBldHRl
dsO1dHRlZCB0w6RuYSB0ZWdlbGV2YWQuDQpQw6RldmEgbMO1cHVzIGFudGFrc2UgdsOkbGphIHN1
dXJ1c2rDpHJndXMgMzAwIDAwMCDigqwgdG9ldHVzdCB0dWdldmFtYXRlIHByb2pla3RpZGUgZWxs
dXZpaW1pc2Vrcy4NCg0K8J+UlyBPdHNlw7xsZWthbmRlIGxpbms6IGh0dHBzOi8vdGVobmlrYS5w
b3N0aW1lZXMuZWUvNzg1NTE5Ny9vdHNldWxla2FubmUtZXR0ZXZvdHRlZC1sYW92YWQtaGluZGFq
YXRlLWV0dGUtbGV0dGktb21hLXN1dXJlZC1zYWxhZHVzZWQtamEtYXJpcGxhYW5pZA0KDQpBSkFL
QVZBDQoxMC40NSAtIEFrdHNpYXNlbHRzIEhhbnNhYg0KMTAuNTUgLSBBdWdtZW50YWwgVGVjaG5v
bG9naWVzIE/DnA0KMTEuMDUgLSBEaWdpdGFsIFNwdXRuaWsgTGlnaHRpbmcgT8OcDQoxMS4xNSAt
IEZvbGRlcml0IE/DnA0KMTEuMjUgLSBHU2NhbiBPw5wNCjExLjQ1IC0gTW90aWNoZWNrIE/DnA0K
MTEuNTUgLSBQb2xpdGUgT8OcDQoxMi4wNSAtIFJleHBsb3JlciBPw5wNCjEyLjE1IC0gU2V2ZW50
aCBTZW5zZSBPw5wNCjEzLjAwIC0gU291bmRmcmVlIE/DnA0KMTMuMTAgLSBUw6RuYXZhcHVoYXN0
dXNlIEFrdHNpYXNlbHRzDQoxMy4yMCAtIFdlcmsgSVQgT8OcDQoxMy4zMCAtIERyaXZlWCBUZWNo
bm9sb2dpZXMgT8OcDQoNCkphZ2FnZSBzZWRhIGxpbmtpIHZhYmFsdCBrw7VpZ2lnYSwga2VzIG9u
IGh1dml0YXR1ZCBBSSBsYWhlbmR1c3Rlc3QuIEvDvHNpbXVzdGUga29ycmFsIHbDtXRrZSDDvGhl
bmR1c3QgT3R0byBNw6R0dGFzZWdhIGFhZHJlc3NpbCBvdHRvLm1hdHRhc0B0ZWhub3BvbC5lZS4N
Cg0KTsOkZW1lIGludGVybmV0aXMhDQpQw7zDvGRrZW0gcMOkZXZhLA0KT3R0byBNw6R0dGFzDQoN
Ci0tLS0tLS0tLS0tLS0tLS0tLS0tLS0NCg0KSGVsbG8gQUkgRW50aHVzaWFzdHMsDQpEb24ndCBt
aXNzIG91dCBvbiB0aGUgbGl2ZSB3ZWJjYXN0IG9mIG91ciB1cGNvbWluZyBBSSBXb3Jrc2hvcCB2
aWEgUG9zdGltZWVzISBUdW5lIGluIHRvIGdhaW4gaW5zaWdodHMgZnJvbSBpbmR1c3RyeSBleHBl
cnRzIGFuZCBpbm5vdmF0aXZlIGNvbXBhbmllcy4NCkF0IHRoZSBlbmQgb2YgdGhlIGRheSwgMzAw
IDAwMCDigqwgd2lsbCBnaXZlbiBvdXQgdG8gc3VwcG9ydCB0aGUgc3Ryb25nZXIgcHJvamVjdHMg
YW5kIHRoZWlyIGV4ZWN1dGlvbi4NCg0K8J+UlyBXZWJjYXN0IExpbms6IGh0dHBzOi8vdGVobmlr
YS5wb3N0aW1lZXMuZWUvNzg1NTE5Ny9vdHNldWxla2FubmUtZXR0ZXZvdHRlZC1sYW92YWQtaGlu
ZGFqYXRlLWV0dGUtbGV0dGktb21hLXN1dXJlZC1zYWxhZHVzZWQtamEtYXJpcGxhYW5pZA0KDQpT
Q0hFRFVMRQ0KMTAuNDUgLSBBa3RzaWFzZWx0cyBIYW5zYWINCjEwLjU1IC0gQXVnbWVudGFsIFRl
Y2hub2xvZ2llcyBPw5wNCjExLjA1IC0gRGlnaXRhbCBTcHV0bmlrIExpZ2h0aW5nIE/DnA0KMTEu
MTUgLSBGb2xkZXJpdCBPw5wNCjExLjI1IC0gR1NjYW4gT8OcDQoxMS40NSAtIE1vdGljaGVjayBP
w5wNCjExLjU1IC0gUG9saXRlIE/DnA0KMTIuMDUgLSBSZXhwbG9yZXIgT8OcDQoxMi4xNSAtIFNl
dmVudGggU2Vuc2UgT8OcDQoxMy4wMCAtIFNvdW5kZnJlZSBPw5wNCjEzLjEwIC0gVMOkbmF2YXB1
aGFzdHVzZSBBa3RzaWFzZWx0cw0KMTMuMjAgLSBXZXJrIElUIE/DnA0KMTMuMzAgLSBEcml2ZVgg
VGVjaG5vbG9naWVzIE/DnA0KDQpGZWVsIGZyZWUgdG8gc2hhcmUgdGhpcyBsaW5rIHdpdGggYW55
b25lIGludGVyZXN0ZWQgaW4gQUkgc29sdXRpb25zLiBGb3IgYW55IHF1ZXN0aW9ucywgcmVhY2gg
b3V0IHRvIE90dG8gTcOkdHRhcyBhdCBvdHRvLm1hdHRhc0B0ZWhub3BvbC5lZS4NCg0KU2VlIHlv
dSBvbmxpbmUhDQpTZWl6ZSB0aGUgZGF5LA0KT3R0byBNw6R0dGFz

Environment Info unstructured-inference 0.5.5 unstructured 0.10.15

yuming-long commented 11 months ago

Hi there!

You need to specify content_source="text/plain" parameter in your partition since the content type of the email body is text/plain. Here is a ticket for auto detect content source in the email: https://github.com/Unstructured-IO/unstructured/issues/1504.

Also looks like the text is encoded, you may also specify the encoding method in parameter encoding to decode the text.

scanny commented 9 months ago

@andreskull I'm closing this as inactive, assuming you got this working. Feel free to reopen if you're still having a problem with this :)